Deep-MI / FastSurfer

PyTorch implementation of FastSurferCNN
Apache License 2.0
459 stars 120 forks source link

Total segmentation volume is too small. Segmentation may be corrupted. #256

Closed araikes closed 1 year ago

araikes commented 1 year ago

Question/Support Request

I have a number of individuals for whom I received the error message in the subject line. I'm trying to debug what went awry. When I overlay aseg.auto_noCCseg on orig.mgz there doesn't appear to be any obvious mis-segmentation (see example below). Is there a concise way for me to debug where these errors are coming from?

Noteworthy (maybe): I am running this on individuals with Alzheimer's. I wouldn't think this would affect the total segmentation, but perhaps.

Screenshots

image

Environment

Execution

singularity pull docker://deepmi/fastsurfer:2.0.1 singularity exec --nv --cleanenv -B /xdisk/adamraikes/xxx/nifti:/input -B /xdisk/adamraikes/xxx/derivatives/fastsurfer-2.0.1:/output -B /groups/adamraikes/license.txt:/license.txt /groups/adamraikes/singularity_images/fastsurfer-2.0.1.sif /fastsurfer/run_fastsurfer.sh --fs_license /license.txt --t1 /input/sub-13452019/ses-10/anat/sub-13452019_ses-10_T1w.nii.gz --sid 13452019_ses-10 --sd /output --parallel

m-reuter commented 1 year ago

Thanks for reporting. Indeed the segmentation looks ok ( on my phone). What volume number are you getting? Maybe our threshold is too conservative.

araikes commented 1 year ago

This is the stdout from my SLURM job in which this individual ran. I checked deep_seg.log but it doesn't have the segmentation volume.

python3.8 /fastsurfer/FastSurferCNN/run_prediction.py --t1 /input/sub-13452019/ses-10/anat/sub-13452019_ses-10_T1w.nii.gz --aparc_aseg_segfile /output/13452019_ses-10/mri/aparc.DKTatlas+aseg.deep.mgz --conformed_name /output/13452019_ses-10/mri/orig.mgz --sid 13452019_ses-10 --seg_log /output/13452019_ses-10/scripts/deep-seg.log --vox_size min --batch_size 1 --viewagg_device auto --device auto
[INFO: run_prediction.py:  341]: Checking or downloading default checkpoints ...
[INFO: run_prediction.py:  158]: Output will be stored in: /output
[INFO: misc.py:  159]: Using device: cpu
[INFO: run_prediction.py:  121]: Running view aggregation on cpu
[INFO: inference.py:   95]: Loading checkpoint /fastsurfer/FastSurferCNN/checkpoints/aparc_vinn_coronal_v2.0.0.pkl
[INFO: inference.py:   95]: Loading checkpoint /fastsurfer/FastSurferCNN/checkpoints/aparc_vinn_sagittal_v2.0.0.pkl
[INFO: inference.py:   95]: Loading checkpoint /fastsurfer/FastSurferCNN/checkpoints/aparc_vinn_axial_v2.0.0.pkl
[INFO: run_prediction.py:  359]: Analyzing single subject /input/sub-13452019/ses-10/anat/sub-13452019_ses-10_T1w.nii.gz
[INFO: run_prediction.py:  247]: Successfully saved image as /output/13452019_ses-10/mri/orig/001.mgz
[INFO: run_prediction.py:  176]: Conforming image
Input:    min: 0.0  max: 166513.76358032227
rescale:  min: 0.0  max: 89084.8635154724  scale: 0.002862439138784909
Output:   min: 0.0  max: 255.0
[INFO: run_prediction.py:  247]: Successfully saved image as /output/13452019_ses-10/mri/orig.mgz
[INFO: run_prediction.py:  210]: Run coronal prediction
[INFO: dataset.py:   55]: Loading Coronal with input voxelsize (1.0, 1.0)
[INFO: dataset.py:   64]: Successfully loaded Image from /input/sub-13452019/ses-10/anat/sub-13452019_ses-10_T1w.nii.gz
100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 256/256 [02:03<00:00,  2.08batch/s]
[INFO: inference.py:  198]: Inference on 256 batches for coronal successful
[INFO: inference.py:  217]: Coronal inference on /input/sub-13452019/ses-10/anat/sub-13452019_ses-10_T1w.nii.gz finished in 123.0854 seconds
[INFO: run_prediction.py:  210]: Run sagittal prediction
[INFO: dataset.py:   46]: Loading Sagittal with input voxelsize (1.0, 1.0)
[INFO: dataset.py:   64]: Successfully loaded Image from /input/sub-13452019/ses-10/anat/sub-13452019_ses-10_T1w.nii.gz
100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 256/256 [01:58<00:00,  2.17batch/s]
[INFO: inference.py:  198]: Inference on 256 batches for sagittal successful
[INFO: inference.py:  217]: Sagittal inference on /input/sub-13452019/ses-10/anat/sub-13452019_ses-10_T1w.nii.gz finished in 118.1760 seconds
[INFO: run_prediction.py:  210]: Run axial prediction
[INFO: dataset.py:   51]: Loading Axial with input voxelsize (1.0, 1.0)
[INFO: dataset.py:   64]: Successfully loaded Image from /input/sub-13452019/ses-10/anat/sub-13452019_ses-10_T1w.nii.gz
100%|\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2588| 256/256 [01:57<00:00,  2.18batch/s]
[INFO: inference.py:  198]: Inference on 256 batches for axial successful
[INFO: inference.py:  217]: Axial inference on /input/sub-13452019/ses-10/anat/sub-13452019_ses-10_T1w.nii.gz finished in 117.5716 seconds
[INFO: run_prediction.py:  247]: Successfully saved image as /output/13452019_ses-10/mri/aparc.DKTatlas+aseg.deep.mgz
[INFO: run_prediction.py:  383]: Creating brainmask based on segmentation...
Creating dilated mask ...
Frontal region special treatment:  12766
  Found 1 connected component(s)!
[INFO: run_prediction.py:  247]: Successfully saved image as /output/13452019_ses-10/mri/mask.mgz
[INFO: run_prediction.py:  389]: Creating aseg based on segmentation...
Reducing to aseg ...
FlipWM: rh 0 and lh 0 flipped.
[INFO: run_prediction.py:  247]: Successfully saved image as /output/13452019_ses-10/mri/aseg.auto_noCCseg.mgz
[INFO: run_prediction.py:  397]: Running volume-based QC check on segmentation...
Checking total volume ...
Voxel size in mm3: 1.0
Total segmentation volume in liter: 0.77
[WARNING: run_prediction.py:  400]: Total segmentation volume is too small. Segmentation may be corrupted.
[ERROR: run_prediction.py:  415]: Single subject failed the volume-based QC check.
ERROR: Segmentation failed QC checks.
m-reuter commented 1 year ago

Thanks, the volume is 0.77 liter which is also our threshold , in our tests we did not encounter such small volumes except if the segmentation was broken. In your case the segmentation looks good, so the threshold can be decreased. https://github.com/Deep-MI/FastSurfer/blob/617ed0d87372eafe94683bd7b2e1096a2727966e/FastSurferCNN/quick_qc.py#L47

I reduced it in dev to 0.75 , so one option is to checkout dev and build a docker/singularity image from that (note that there are some other commits in dev and it has not been thoroughly tested), or create a new image from your stable image after entering the image and manually changing that value in quick_qc.py . I wish we had a flag to ignore those errors, but currently we don't.

araikes commented 1 year ago

Thanks for the quick edit. If my call is exactly that and that's all I'm planning at the moment, is there anything in the dev branch that would be substantively different from 2.0.1 or would I need to re-run all of my participants?

dkuegler commented 1 year ago

I think you should just be able to run --surf_only (just add that flag to your call) to create the surfaces. (Nothing of note really happens between the error that you encountered and the surface generation).

no dev or modifications needed, just use stable / your existing image.

araikes commented 1 year ago

@dkuegler,

I'll give it a shot and see if it work. Thanks.

araikes commented 1 year ago

Using --surf_only does not bypass the segmentation QC check.

araikes commented 1 year ago

@m-reuter,

Just another use case in this dataset. This individual has a total volume of 0.72.

image

m-reuter commented 1 year ago

Are these adults? I guess we need to go even lower or switch off the test and come up with something different.

m-reuter commented 1 year ago

Just switched from ERROR to a WARNING in dev. You can checkout stable, copy quick_qc.py over from dev and re-build the containers. We may also push a hot fix release at some point.

m-reuter commented 1 year ago

I created a stable patch release v2.0.2 that gives a WARNING and continues. We will probably update docker images today or tomorrow.

In the future we need to:

araikes commented 1 year ago

Thanks @m-reuter. I appreciate the quick fix.

araikes commented 1 year ago

@m-reuter,

I pulled v. 2.0.4's Docker image and I still get an error and termination of the segmentation when the volume is <= 0.75.

araikes commented 1 year ago

It's here:

https://github.com/Deep-MI/FastSurfer/blob/54247b83b1babf36f84683282c60d549b9b1a940/FastSurferCNN/run_prediction.py#L413-L416

m-reuter commented 1 year ago

Thanks for testing. I had fixed quick_qc to print a warning and not exit with error, but it turns out quick_qc is not actually called. Only a function from that file is called and here it is still taken as an error. This means we need to do another hot fix. Sorry for that.

m-reuter commented 1 year ago

By the way, you might be able to workaround it this time by doing the --surf_only , as that should bypass the first check.

Another question: the smallest ratio you had is .72 ? And what age range are these cases from?

araikes commented 1 year ago

It's a patient population with Alzheimer's. All aged late 70s to late 90s.

araikes commented 1 year ago

Also, in the short term, locally I simply changed that sys.exit(1) to sys.exit(0) so it prints the warning and mounted that file in as run_prediction.py and it seems to be working without issue.

dkuegler commented 1 year ago

@araikes It makes sense this would work for you.

One question, when you reran the subject with --surf_only, you said this did not work. It seems to me you reran on a clean subject folder. Can you confirm, you did NOT rerun on the "result" of the last (failed) run?

m-reuter commented 1 year ago

should be solved as we now only generate warnings and reduced the threshold further.