Deep-MI / FastSurfer

PyTorch implementation of FastSurferCNN
Apache License 2.0
445 stars 120 forks source link

ERROR: FastSurfer asegdkt segmentation failed. #542

Open StephDocTUM opened 1 month ago

StephDocTUM commented 1 month ago

Dear all,

I am running FastSurfer on a nativ T1 .nii image on which the "regular" Freesurfer recon-all processed the file successfully. With FastSurfer I am running into the error ERROR: FastSurfer asegdkt segmentation failed. (Running on OS Ubuntu 22.04)

Full command:

sudo docker run --gpus all            
-v /media/stn/data1//FreeSurfer/anat_t1_2007_2023:/data            
-v /media/stn/data1//FreeSurfer/FastSurfer:/output            
-v /media/stn/data1//FreeSurfer/license.txt:/fs_license/license.txt            
--rm --user $(id -u):$(id -g) deepmi/fastsurfer:latest            
--fs_license /fs_license/license.txt           
 --t1 /data/2007_04_3/anat_t1.nii            
--sid subjectX --sd /output            
--parallel --3T

Full error:

Version: 2.2.0+9f37d02
Wed Jul 10 07:22:23 UTC 2024

python-s /fastsurfer/FastSurferCNN/run_prediction.py --t1 /data/2007_04_13/anat_t1.nii --asegdkt_segfile /output/subjectX/mri/aparc.DKTatlas+aseg.deep.mgz --conformed_name /output/subjectX/mri/orig.mgz --brainmask_name /output/subjectX/mri/mask.mgz --aseg_name /output/subjectX/mri/aseg.auto_noCCseg.mgz --sid subjectX --seg_log /output/subjectX/scripts/deep-seg.log --vox_size min --batch_size 1 --viewagg_device auto --device auto 
/venv/bin/python-s: line 3:    52 Segmentation fault      (core dumped) python3.10 -s ${inputargs[@]}
ERROR: FastSurfer asegdkt segmentation failed.

Now I am unsure if this is a problem by FastSurfer or am I doing anything wrong

dkuegler commented 1 month ago

Segmentation fault is a curious error. In fact, usually python's guarantee would be that segmentation faults cannot/should not happen. Generally, it is very unlikely to be a bug in our code. As a quick check to see if/how the call to python is correct, can you please add: --py python3.10 to your command.

Are you able to (confidentially) share the image file with us? Then we can try to reproduce the error on-site.

Otherwise, are you able to check on a different machine/different FastSurfer version (e.g. docker pull deepmi/fastsurfer:gpu-v2.1.2). Or even the cpu-docker (docker pull deepmi/fastsurfer:cpu-v2.2.0)?

m-reuter commented 1 month ago

Also check if Nvidia GPU is installed correctly on the host, run nvidia-smi . How much RAM does the host system have and how much the GPU?

MeIngBest commented 1 month ago

Thanks a lot for your replies. using --py python3.10 works fine. Using deepmi/fastsurfer:gpu-v2.1.2 leads to:

N4 Bias Correction Parameters:

  • input volume: /output/subjectX/mri/orig.mgz
  • output volume: /output/subjectX/mri/orig_nu.mgz
  • mask: /output/subjectX/mri/mask.mgz
  • shrink factor: 4
  • number fitting levels: 4
  • number iterations: 50
  • convergence threshold: 0.0
  • skipwm: False
  • threads: 1

reading /output/subjectX/mri/orig.mgz read MGZ (FreeSurfer) image via nibabel... read MGZ (FreeSurfer) image via nibabel...

executing N4 correction ... Traceback (most recent call last): File "/fastsurfer/recon_surf/N4_bias_correct.py", line 333, in itkcorrected = N4correctITK( File "/fastsurfer/recon_surf/N4_bias_correct.py", line 158, in N4correctITK corrector.Execute(itkimage, itkmask) File "/venv/lib/python3.8/site-packages/SimpleITK/SimpleITK.py", line 43070, in Execute return _SimpleITK.N4BiasFieldCorrectionImageFilter_Execute(self, *args) RuntimeError: Exception thrown in SimpleITK N4BiasFieldCorrectionImageFilter_Execute: /tmp/SimpleITK/Code/BasicFilters/src/sitkImageFilter.cxx:63: sitk::ERROR: Input "maskImage" for "N4BiasFieldCorrectionImageFilter" has size of [ 99, 99, 99 ] which does not match the primary input's size of [ 79, 79, 79 ]! ERROR: Biasfield correction failed

When using other images

[ 79, 79, 79 ]

changes accordingly.

Using

deepmi/fastsurfer:cpu-v2.2.0

is still running but seems to work. Does it use the regular recon-all command from freesurfer?

Bildschirmfoto 2024-07-12 um 12 44 48

Total RAM is 62.7GB.

dkuegler commented 1 month ago

The error in 2.1.2 does surprise me. But I don't think it is related.

It seems the python-s handle is causing the issue. We have removed this handle for the next version and I think you are able to workaround this issue by specifying --py python3.10 which is in effect almost the sam. Just make sure you are not mounting a home directory, because you are loosing the -s flag for python which usually disabled loading python packages from the home directory and therefore potentially changing the versions of python packages.

m-reuter commented 1 month ago

Does it use the regular recon-all command from freesurfer?

recon-surf uses many of FreeSurfer's binaries and recon-all calls, but is a different pipeline. First, we can skip many steps as we already have a full brain segmentation from the neural network, second we optimize (and fix) multiple steps for speed (e.g. surface inflation, resulting in less topological defects) and reliability (improved spherical rotational pre-alignment to stabilize location of the central sulcus) compared to recon-all.

I will close this issue now as we have a workaround and this "should" be fixed in the next release as we dropped the python-s script. Happy to re-open if anything comes up.

MeIngBest commented 1 month ago

I am sorry, misunderstood. When adding --py python3.10 I do not get an error specifically concerning python but still an error:

sudo docker run --gpus all \
           -v /media/stn/data1/Brain_vs_Gut/FreeSurfer/anat_t1_2007_2023:/data \
           -v /media/stn/data1/Brain_vs_Gut/FreeSurfer/FastSurfer:/output \
           -v /media/stn/data1/Brain_vs_Gut/FreeSurfer/license.txt:/fs_license/license.txt \
           --rm --user $(id -u):$(id -g) deepmi/fastsurfer:latest \
           --py python3.10 \
           --fs_license /fs_license/license.txt \
           --t1 /data/2014_09_29/anat_t1.nii \
           --sid 2014_09_29 --sd /output 
Version: 2.2.0+9f37d02
Fri Jul 12 15:43:29 UTC 2024

python3.10 /fastsurfer/FastSurferCNN/run_prediction.py --t1 /data/2014_09_29/anat_t1.nii --asegdkt_segfile /output/subjectX/mri/aparc.DKTatlas+aseg.deep.mgz --conformed_name /output/subjectX/mri/orig.mgz --brainmask_name /output/subjectX/mri/mask.mgz --aseg_name /output/subjectX/mri/aseg.auto_noCCseg.mgz --sid subjectX --seg_log /output/subjectX/scripts/deep-seg.log --vox_size min --batch_size 1 --viewagg_device auto --device auto 
ERROR: FastSurfer asegdkt segmentation failed.
/fastsurfer/run_fastsurfer.sh: line 797:    46 Segmentation fault      (core dumped) $cmd
dkuegler commented 1 month ago

Cam you share the deepseg.log file of that subject (in scripts).

And CPU and GPU information, so lscpu and nvidia-smi on the host.

MeIngBest commented 1 month ago

Here the information:

deepseg.log:

Version: 2.2.0+9f37d02
Log file for segmentation FastSurferCNN/run_prediction.py
Fri Jul 12 19:47:13 UTC 2024

python3.10 /fastsurfer/FastSurferCNN/run_prediction.py --t1 /data/2020_10_09.nii --asegdkt_segfile /output/2020_10_09.nii/mri/aparc.DKTatlas+aseg.deep.mgz --conformed_name /output/2020_10_09.nii/mri/orig.mgz --brainmask_name /output/2020_10_09.nii/mri/mask.mgz --aseg_name /output/2020_10_09.nii/mri/aseg.auto_noCCseg.mgz --sid 2020_10_09.nii --seg_log /output/2020_10_09.nii/scripts/deep-seg.log --vox_size min --batch_size 1 --viewagg_device auto --device auto
~                                  
Bildschirmfoto 2024-07-12 um 21 49 48
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  16
  On-line CPU(s) list:   0-15
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 7 3800X 8-Core Processor
    CPU family:          23
    Model:               113
    Thread(s) per core:  2
    Core(s) per socket:  8
    Socket(s):           1
    Stepping:            0
    Frequency boost:     enabled
    CPU max MHz:         4558,8862
    CPU min MHz:         2200,0000
    BogoMIPS:            7785.51
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx
                          fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_goo
                         d nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fm
                         a cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm 
                         extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topo
                         ext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ss
                         bd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap cl
                         flushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cq
                         m_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_sav
                         e tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vms
                         ave_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sev sev_es
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   256 KiB (8 instances)
  L1i:                   256 KiB (8 instances)
  L2:                    4 MiB (8 instances)
  L3:                    32 MiB (2 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-15
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Mitigation; untrained return thunk; SMT enabled with STIBP protection
  Spec rstack overflow:  Mitigation; Safe RET
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS
                          Not affected; BHI Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected