Closed ccraddock closed 9 years ago
probably need
ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=1
brian
On Sun, Jan 4, 2015 at 10:58 PM, Cameron Craddock notifications@github.com wrote:
I am experiencing a problem where antsApplyTransforms is hanging in ANTs-2.1.0rc2. I running several scripts that call antsApplyTransforms using GNU parallel to achieve parallel execution on a multicore workstation (64 processors, 256 GB RAM). I am applying affine and warp transforms to heterogenous fMRI data, but on average the resolution is 3.5 mm x 3.5 mm x 4 mm x 300 TRs. Five minutes or so after starting the processing, I will notice at least one process that is consuming ~1.4GB of RAM and using 0% CPU. I am able to reproduce this problem fairly reliably. I am transforming 4,000 files and keep restarting to pick up the files that hung in the previous iteration.
Here is what I have been able to figure out, so far. This is the information that is returned from ps:
63483 9.4 0.2 4678684 683964 pts/6 S+ 22:01 1:27 antsApplyTransforms -d 3 -e 3 -i /tmp/tmp.J6PLI4JNa4/infile_smoothed.nii.gz -r /usr/share/fsl/5.0/data/standard/MNI152_T1_3mm_symmetric.nii.gz -o /mnt/3tb/abide_vmhc/Output_2014-06-14_symmetric/filt_global/0050126_symmetric_functional.nii.gz -n Linear -t /data/Projects/ABIDE_Initiative/Derivatives/CPAC/vmhc/Out/pipeline_symmetric/0050126_session_1/anatomical_to_mni_nonlinear_xfm/ants_Warp.nii.gz -t /data/Projects/ABIDE_Initiative/Derivatives/CPAC/vmhc/Out/pipeline_symmetric/0050126_session_1/ants_affine_xfm/ants_Affine.txt -t /tmp/tmp.J6PLI4JNa4/func_to_anat_affine.txt
Strace stats that the process is waiting for mutex
strace -p 63483 futex(0x31a9818, FUTEX_WAIT, 1, NULL
Next I attached the process with gdb
gdb -p 63483
Backtrace shows that the process is blocked in a pthread_join() call
(gdb) bt
0 0x00007fb8a96e1148 in pthread_join (threadid=52073800, thread_return=0x0) at pthread_join.c:89
1 0x000000000155407e in itk::MultiThreader::WaitForSingleMethodThread(unsigned long) ()
2 0x00000000015554b5 in itk::MultiThreader::SingleMethodExecute() ()
3 0x0000000000b831c9 in itk::ImageSource<itk::Image<double, 3u> >::GenerateData (this=0x31a94d0)
at /opt/ants/ITKv4-install/include/ITK-4.6/itkImageSource.hxx:242
4 0x000000000154c223 in itk::ProcessObject::UpdateOutputData(itk::DataObject*) ()
5 0x0000000000a72906 in itk::ImageBase<3u>::UpdateOutputData (this=0x31b72a0)
at /opt/ants/ITKv4-install/include/ITK-4.6/itkImageBase.hxx:287
6 0x00000000009a3cc2 in ants::antsApplyTransforms<double, 3u> (parser=..., inputImageType=3)
at src/ANTs-2.1.0rc2/Examples/antsApplyTransforms.cxx:367
7 0x0000000000991168 in ants::antsApplyTransforms (args=std::vector of length 19, capacity 36 = {...})
at src/ANTs-2.1.0rc2/Examples/antsApplyTransforms.cxx:977
8 0x000000000098b9b1 in main (argc=19, argv=0x7fff12c7ebc8) at /opt/ants/ANTS-build/Examples/cli_antsApplyTransforms.cxx:11
But there only seems to a single thread
(gdb) info threads Id Target Id Frame
- 1 Thread 0x7fb8a9b13740 (LWP 63483) "antsApplyTransf" 0x000000000155407e in itk::MultiThreader::WaitForSingleMethodThread(unsigned long) () (gdb) thread [Current thread is 1 (Thread 0x7fb8a9b13740 (LWP 63483))]
I moved to the deepest frame in ITK for which I have symbols. I was able to recompile ANTs with debug symbols, but it didn't work for ITK. Do you have any suggestions for how to turn on DEBUG mode in ITK? I am compiling ITK through the ANTs build and not standalone.
(gdb) select-frame 3 (gdb) frame
3 0x0000000000b831c9 in itk::ImageSource<itk::Image<double, 3u> >::GenerateData (this=0x31a94d0)
at /opt/ants/ITKv4-install/include/ITK-4.6/itkImageSource.hxx:242
242 this->GetMultiThreader()->SingleMethodExecute(); (gdb) info locals str = {Filter = {m_Pointer = 0x31a94d0}} outputPtr = 0x31b72a0 splitter = 0x3164440 validThreads = 61
ITK is configuring itself to use up to 61 threads based on the number of processors on my system. I limit this value to 2 using the environment variable:
ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=2
But still had the same problem.
I am also having a problem where antsApplyTransforms periodically dies with a segmentation fault. If I keep rerunning the script on the same data it will eventually work.
— Reply to this email directly or view it on GitHub https://github.com/stnava/ANTs/issues/142.
I am experiencing a problem where antsApplyTransforms is hanging in ANTs-2.1.0rc2. I running several scripts that call antsApplyTransforms using GNU parallel to achieve parallel execution on a multicore workstation (64 processors, 256 GB RAM). I am applying affine and warp transforms to heterogenous fMRI data, but on average the resolution is 3.5 mm x 3.5 mm x 4 mm x 300 TRs. Five minutes or so after starting the processing, I will notice at least one process that is consuming ~1.4GB of RAM and using 0% CPU. I am able to reproduce this problem fairly reliably. I am transforming 4,000 files and keep restarting to pick up the files that hung in the previous iteration.
Here is what I have been able to figure out, so far.
This is the information that is returned from ps:
Strace stats that the process is waiting for mutex
Next I attached the process with gdb
Backtrace shows that the process is blocked in a
pthread_join()
callBut there only seems to a single thread
I moved to the deepest frame in ITK for which I have symbols. I was able to recompile ANTs with debug symbols, but it didn't work for ITK. Do you have any suggestions for how to turn on DEBUG mode in ITK? I am compiling ITK through the ANTs build and not standalone.
ITK is configuring itself to use up to 61 threads based on the number of processors on my system. I limit this value to 2 using the environment variable:
But still had the same problem.
I am also having a problem where antsApplyTransforms periodically dies with a segmentation fault. If I keep rerunning the script on the same data it will eventually work.