Closed ntustison closed 3 months ago
Other people have definitely posted issues with things being different or memory issues in python or R... It must be related to the wrappers, right? Perhaps the C++ versions need to be updated?
As the OP in #552 tried to warn us, it is some kind of memory leak in ANTsPy and presumably R too.
I modified the script slightly
import sys
import os
os.environ['ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS'] = '4'
import ants
in_directory = "./"
seg = ants.image_read(in_directory + "seg.nii.gz")
gm = ants.image_read(in_directory + "gm.nii.gz")
wm = ants.image_read(in_directory + "wm.nii.gz")
iter=3
label=sys.argv[1]
print("KKpy " + label)
kk = ants.kelly_kapowski(s=seg, g=gm, w=wm,
its=iter, r=0.0025, m=1.5, x=0, t=10,
verbose=1)
ants.image_write(kk, in_directory + "results/py_kk_" + label + ".nii.gz")
This gives identical results if I run a bash for loop calling this multiple times. Iif I run a for loop inside python, it varies some of the time.
Thanks @cookpa and @ncullen93 for looking at this. I've narrowed it down a bit in the actual ITK filter but am still completely perplexed as to why this is happening.
@cookpa, I pulled out the for loop from my original Python script and it looks as though the problem might persist. I'm double checking that right now.
You're right. It happened 0 out of 10 times last night, but today I tried it again and it failed 1 out of 50 times. The 49 other runs were identical to KellyKapowski
Thanks @cookpa for the confirmation.
To add to my confusion, I'm currently running the original Python script where the for loop is internal to the Python script on my GPU linux box. So far I'm 40 iterations in and it hasn't failed once.
Bizarre. I haven't run on Linux but on my Mac, running with a for loop in Python produces different results nearly every run. Without a for loop the failure rate is approximately 1%, it's otherwise consistent with the CLI. I don't get it at all
I tried re-reading the input images at every iteration, but it doesn't help. So I don't think the input images are changing during the for loop.
Yeah, I initially tried that but ultimately put image reading outside the loop to rule out that as a possibility. I also ran the original Python script on the university linux cluster and it reproduced the expected result every time. Within the DiReCT filter I've isolated it to which variable is deviating from the expected value on my Mac but I haven't figured out why yet.
I also dont know what's happening and have seen this issue in my large-scale studies that are run on a linux cluster ... but my guess is that the issue comes down to something to do with (reliability of) precision for relatively small float values. if some of the very small values are shrunk ( lost in some sense ) then the algorithm would lead to overly large thickness estimates. I wonder if the "count" image ( or any other image used in the denominator ) should be stored as double precision.
@stnava , In this case I should've mentioned that my initial run on the UVa cluster showed some variation in output. After updating with @ncullen93 's updating of the tags, I was able to get the expected repeatability in output.
In reference to my comment to @cookpa , the two main image variables in the filter in constructing the thickness image, hitImage
and totalImage
, the hitImage
is consistent through the different runs. It's the totalImage
that seems to be the immediate cause of variation in output. I'll try using double precision for that image and see if that fixes the issue.
@ntustison thanks - good to know. re hit
vs total
I would have guessed the opposite effect. I wonder if any insight could be gained from writing out every state of any image involved in the total
... if all else fails.
Yeah, will do after I investigate the effect of changing RealType = float
to RealType = double
in KellyKapowski.cxx. It gets a little messy with the integration step but it would definitely narrow it down.
Okay, I'm pretty sure this was the issue.
These are always bittersweet---glad to have figured it out but it's so damn obvious in hindsight.
Re-opening, we will fix by updating ANTs version to incorporate updates from @ntustison
Okay, just a quick update on this issue. It appears that there's something about running KK through the ANTsPy or ANTsR interface which causes a "failure", i.e. instances where the thickness value(s) blow up beyond what is expected. Although KK is deterministic, the problem occurs completely at random, occurs with a single thread or multiple threads, and the deviation from the correct or expected thickness image occurs in the first couple of optimization iterations. Also, as a check, it does not occur when running it via the command line.
Below are minimal examples comparing the three scripts running KK on the same data in a loop of 10 iterations: shell, python, and R. On my machine, the shell script produces 10 images which are exactly the same and have the expected values. The R and Python scripts produce the expected image anywhere from 50% to 80% of the time (very loose estimate).
I'll be diving deeper and seeing which exact internal image variable deviates but just wanted to provide an update. Also, any thoughts are welcome on why running KK through R and Python causes this.
Shell
Python
R
gm.nii.gz seg.nii.gz wm.nii.gz