ANTsX / ANTs

Advanced Normalization Tools (ANTs)
Apache License 2.0
1.21k stars 381 forks source link

antsApplyTransforms hangs in ANTs-2.1.0rc2 #142

Closed ccraddock closed 9 years ago

ccraddock commented 9 years ago

I am experiencing a problem where antsApplyTransforms is hanging in ANTs-2.1.0rc2. I running several scripts that call antsApplyTransforms using GNU parallel to achieve parallel execution on a multicore workstation (64 processors, 256 GB RAM). I am applying affine and warp transforms to heterogenous fMRI data, but on average the resolution is 3.5 mm x 3.5 mm x 4 mm x 300 TRs. Five minutes or so after starting the processing, I will notice at least one process that is consuming ~1.4GB of RAM and using 0% CPU. I am able to reproduce this problem fairly reliably. I am transforming 4,000 files and keep restarting to pick up the files that hung in the previous iteration.

Here is what I have been able to figure out, so far.

This is the information that is returned from ps:

63483  9.4  0.2 4678684 683964 pts/6  S+   22:01   1:27 antsApplyTransforms -d 3 -e 3 -i /tmp/tmp.J6PLI4JNa4/infile_smoothed.nii.gz -r /usr/share/fsl/5.0/data/standard/MNI152_T1_3mm_symmetric.nii.gz -o /mnt/3tb/abide_vmhc/Output_2014-06-14_symmetric/filt_global/0050126_symmetric_functional.nii.gz -n Linear -t /data/Projects/ABIDE_Initiative/Derivatives/CPAC/vmhc/Out/pipeline_symmetric/0050126_session_1/anatomical_to_mni_nonlinear_xfm/ants_Warp.nii.gz -t /data/Projects/ABIDE_Initiative/Derivatives/CPAC/vmhc/Out/pipeline_symmetric/0050126_session_1/ants_affine_xfm/ants_Affine.txt -t /tmp/tmp.J6PLI4JNa4/func_to_anat_affine.txt

Strace stats that the process is waiting for mutex

strace -p 63483
futex(0x31a9818, FUTEX_WAIT, 1, NULL

Next I attached the process with gdb

gdb -p 63483

Backtrace shows that the process is blocked in a pthread_join() call

(gdb) bt
#0  0x00007fb8a96e1148 in pthread_join (threadid=52073800, thread_return=0x0) at pthread_join.c:89
#1  0x000000000155407e in itk::MultiThreader::WaitForSingleMethodThread(unsigned long) ()
#2  0x00000000015554b5 in itk::MultiThreader::SingleMethodExecute() ()
#3  0x0000000000b831c9 in itk::ImageSource<itk::Image<double, 3u> >::GenerateData (this=0x31a94d0)
    at /opt/ants/ITKv4-install/include/ITK-4.6/itkImageSource.hxx:242
#4  0x000000000154c223 in itk::ProcessObject::UpdateOutputData(itk::DataObject*) ()
#5  0x0000000000a72906 in itk::ImageBase<3u>::UpdateOutputData (this=0x31b72a0)
    at /opt/ants/ITKv4-install/include/ITK-4.6/itkImageBase.hxx:287
#6  0x00000000009a3cc2 in ants::antsApplyTransforms<double, 3u> (parser=..., inputImageType=3)
    at src/ANTs-2.1.0rc2/Examples/antsApplyTransforms.cxx:367
#7  0x0000000000991168 in ants::antsApplyTransforms (args=std::vector of length 19, capacity 36 = {...})
    at src/ANTs-2.1.0rc2/Examples/antsApplyTransforms.cxx:977
#8  0x000000000098b9b1 in main (argc=19, argv=0x7fff12c7ebc8) at /opt/ants/ANTS-build/Examples/cli_antsApplyTransforms.cxx:11

But there only seems to a single thread

(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7fb8a9b13740 (LWP 63483) "antsApplyTransf" 0x000000000155407e in itk::MultiThreader::WaitForSingleMethodThread(unsigned long) ()
(gdb) thread
[Current thread is 1 (Thread 0x7fb8a9b13740 (LWP 63483))]

I moved to the deepest frame in ITK for which I have symbols. I was able to recompile ANTs with debug symbols, but it didn't work for ITK. Do you have any suggestions for how to turn on DEBUG mode in ITK? I am compiling ITK through the ANTs build and not standalone.

(gdb) select-frame 3
(gdb) frame
#3  0x0000000000b831c9 in itk::ImageSource<itk::Image<double, 3u> >::GenerateData (this=0x31a94d0)
    at /opt/ants/ITKv4-install/include/ITK-4.6/itkImageSource.hxx:242
242   this->GetMultiThreader()->SingleMethodExecute();
(gdb) info locals
str = {Filter = {m_Pointer = 0x31a94d0}}
outputPtr = 0x31b72a0
splitter = 0x3164440
validThreads = 61

ITK is configuring itself to use up to 61 threads based on the number of processors on my system. I limit this value to 2 using the environment variable:

ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=2

But still had the same problem.

I am also having a problem where antsApplyTransforms periodically dies with a segmentation fault. If I keep rerunning the script on the same data it will eventually work.

stnava commented 9 years ago

probably need

ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=1

brian

On Sun, Jan 4, 2015 at 10:58 PM, Cameron Craddock notifications@github.com wrote:

I am experiencing a problem where antsApplyTransforms is hanging in ANTs-2.1.0rc2. I running several scripts that call antsApplyTransforms using GNU parallel to achieve parallel execution on a multicore workstation (64 processors, 256 GB RAM). I am applying affine and warp transforms to heterogenous fMRI data, but on average the resolution is 3.5 mm x 3.5 mm x 4 mm x 300 TRs. Five minutes or so after starting the processing, I will notice at least one process that is consuming ~1.4GB of RAM and using 0% CPU. I am able to reproduce this problem fairly reliably. I am transforming 4,000 files and keep restarting to pick up the files that hung in the previous iteration.

Here is what I have been able to figure out, so far. This is the information that is returned from ps:

63483 9.4 0.2 4678684 683964 pts/6 S+ 22:01 1:27 antsApplyTransforms -d 3 -e 3 -i /tmp/tmp.J6PLI4JNa4/infile_smoothed.nii.gz -r /usr/share/fsl/5.0/data/standard/MNI152_T1_3mm_symmetric.nii.gz -o /mnt/3tb/abide_vmhc/Output_2014-06-14_symmetric/filt_global/0050126_symmetric_functional.nii.gz -n Linear -t /data/Projects/ABIDE_Initiative/Derivatives/CPAC/vmhc/Out/pipeline_symmetric/0050126_session_1/anatomical_to_mni_nonlinear_xfm/ants_Warp.nii.gz -t /data/Projects/ABIDE_Initiative/Derivatives/CPAC/vmhc/Out/pipeline_symmetric/0050126_session_1/ants_affine_xfm/ants_Affine.txt -t /tmp/tmp.J6PLI4JNa4/func_to_anat_affine.txt

Strace stats that the process is waiting for mutex

strace -p 63483 futex(0x31a9818, FUTEX_WAIT, 1, NULL

Next I attached the process with gdb

gdb -p 63483

Backtrace shows that the process is blocked in a pthread_join() call

(gdb) bt

0 0x00007fb8a96e1148 in pthread_join (threadid=52073800, thread_return=0x0) at pthread_join.c:89

1 0x000000000155407e in itk::MultiThreader::WaitForSingleMethodThread(unsigned long) ()

2 0x00000000015554b5 in itk::MultiThreader::SingleMethodExecute() ()

3 0x0000000000b831c9 in itk::ImageSource<itk::Image<double, 3u> >::GenerateData (this=0x31a94d0)

at /opt/ants/ITKv4-install/include/ITK-4.6/itkImageSource.hxx:242

4 0x000000000154c223 in itk::ProcessObject::UpdateOutputData(itk::DataObject*) ()

5 0x0000000000a72906 in itk::ImageBase<3u>::UpdateOutputData (this=0x31b72a0)

at /opt/ants/ITKv4-install/include/ITK-4.6/itkImageBase.hxx:287

6 0x00000000009a3cc2 in ants::antsApplyTransforms<double, 3u> (parser=..., inputImageType=3)

at src/ANTs-2.1.0rc2/Examples/antsApplyTransforms.cxx:367

7 0x0000000000991168 in ants::antsApplyTransforms (args=std::vector of length 19, capacity 36 = {...})

at src/ANTs-2.1.0rc2/Examples/antsApplyTransforms.cxx:977

8 0x000000000098b9b1 in main (argc=19, argv=0x7fff12c7ebc8) at /opt/ants/ANTS-build/Examples/cli_antsApplyTransforms.cxx:11

But there only seems to a single thread

(gdb) info threads Id Target Id Frame

  • 1 Thread 0x7fb8a9b13740 (LWP 63483) "antsApplyTransf" 0x000000000155407e in itk::MultiThreader::WaitForSingleMethodThread(unsigned long) () (gdb) thread [Current thread is 1 (Thread 0x7fb8a9b13740 (LWP 63483))]

I moved to the deepest frame in ITK for which I have symbols. I was able to recompile ANTs with debug symbols, but it didn't work for ITK. Do you have any suggestions for how to turn on DEBUG mode in ITK? I am compiling ITK through the ANTs build and not standalone.

(gdb) select-frame 3 (gdb) frame

3 0x0000000000b831c9 in itk::ImageSource<itk::Image<double, 3u> >::GenerateData (this=0x31a94d0)

at /opt/ants/ITKv4-install/include/ITK-4.6/itkImageSource.hxx:242

242 this->GetMultiThreader()->SingleMethodExecute(); (gdb) info locals str = {Filter = {m_Pointer = 0x31a94d0}} outputPtr = 0x31b72a0 splitter = 0x3164440 validThreads = 61

ITK is configuring itself to use up to 61 threads based on the number of processors on my system. I limit this value to 2 using the environment variable:

ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=2

But still had the same problem.

I am also having a problem where antsApplyTransforms periodically dies with a segmentation fault. If I keep rerunning the script on the same data it will eventually work.

— Reply to this email directly or view it on GitHub https://github.com/stnava/ANTs/issues/142.