BUG: list index out of range if image.direction is not (1,0,0,0,-1,0,0,0,1)

puccj commented 1 month ago

Hi, first of all thank you for this wonderful software. I was trying to use it on the VerSe database, specifically on this version here. I have tried to run moosez in:

my dell xps 15 with NVIDIA GeForce GTX 1050 and 16GB of RAM, with Windows Subsystem for Linux (on Windows 11);
the computer cluster of my university department, with linux and 100GB of RAM dedicated to my task. Unfortunately I don't have access to the GPU and the execution on the cluster were done with a CPU (slower, but that's okay)

In both cases, I worked in a conda environment with python 3.10 and the latest version of moosez installed with pip.

When I use the command moosez -d <direcory> -m clin_ct_vertebrae, it results in three different errors for some of the subjects:

list index out of range (the most common one)
Killed
SVD did not converge

The same error always arises from the same subject. The last two errors occur instantly as (or soon after) moose is run and their outputs are: ⠧ [1/52] Running prediction for sub-verse092 using clin_ct_vertebrae...Killed

⠧ [1/104] Running prediction for sub-verse036 using clin_ct_vertebrae.../home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/moosez/image_processing.py:522: RuntimeWarning: invalid value encountered in scalar divide
  new_affine[diagonal, diagonal] = (new_affine[diagonal, diagonal] / abs(
/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/numpy/linalg/linalg.py:2180: RuntimeWarning: invalid value encountered in det
  r = _umath_linalg.det(a, signature=signature)
Traceback (most recent call last):
  File "/home/daniele/pattern-recognition/.venv/bin/moosez", line 8, in <module>
⠇ [1/104] Running prediction for sub-verse036 using clin_ct_vertebrae...    sys.exit(main())
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/moosez/moosez.py", line 198, in main
    predict.predict(model_name, input_dir, output_dir, accelerator)
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/moosez/predict.py", line 57, in predict
    temp_input_dir, resampled_image, moose_image_object = preprocess(input_dir, model_name)
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/moosez/predict.py", line 105, in preprocess
    resampled_image = ImageResampler.resample_image(moose_img_object=moose_image_object,
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/moosez/image_processing.py", line 532, in resample_image
    resampled_image = nibabel.Nifti1Image(sitk.GetArrayFromImage(resampled_sitk_image).swapaxes(0, 2),
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/nibabel/nifti1.py", line 1758, in __init__
    super(Nifti1Pair, self).__init__(dataobj,
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/nibabel/analyze.py", line 918, in __init__
    super(AnalyzeImage, self).__init__(
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/nibabel/spatialimages.py", line 469, in __init__
    self.update_header()
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/nibabel/nifti1.py", line 2034, in update_header
    super(Nifti1Image, self).update_header()
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/nibabel/nifti1.py", line 1797, in update_header
    super(Nifti1Pair, self).update_header()
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/nibabel/spatialimages.py", line 503, in update_header
    self._affine2header()
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/nibabel/nifti1.py", line 1807, in _affine2header
    hdr.set_qform(self._affine, code='unknown')
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/nibabel/nifti1.py", line 1024, in set_qform
    P, S, Qs = npl.svd(R)
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/numpy/linalg/linalg.py", line 1681, in svd
    u, s, vh = gufunc(a, signature=signature, extobj=extobj)
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/numpy/linalg/linalg.py", line 121, in _raise_linalgerror_svd_nonconvergence
    raise LinAlgError("SVD did not converge")
numpy.linalg.LinAlgError: SVD did not converge

The first error is where it gets interesting. The error does not occur during the prediction, but in the postprocess operations, as you can understand both from the fact that the error is thrown after some time or just by its output:

[2/114] Running prediction for sub-verse007 using clin_ct_vertebrae...Traceback (most recent call last):
  File "/home/daniele/pattern-recognition/.venv/bin/moosez", line 8, in <module>
    sys.exit(main())
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/moosez/moosez.py", line 198, in main
    predict.predict(model_name, input_dir, output_dir, accelerator)
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/moosez/predict.py", line 79, in predict
    postprocess(original_image_files[0], output_dir, model_name)
  File "/home/daniele/pattern-recognition/.venv/lib/python3.10/site-packages/moosez/predict.py", line 132, in postprocess
    predicted_image = file_utilities.get_files(output_dir, '.nii.gz')[0]
IndexError: list index out of range

Since the single-line execution of moose is interrupted in all of the three cases I couldn't use some sort of error management and I manually re-run moose many times to differentiate between the "good" and "bad" subjects. I found that moose works for only 62 of them out of the total 160. I then used Simple ITK to see if there were some common features that differentiate the good and the bad images. What I found was that:

the direction (image.getDirection) is (1, 0, 0 ,0 , -1, 0, 0, 0, 1) for all the good images, while bad images have all different directions
for almost all the good images, the spacing is (1,1,1), or anyway is always between 0.9 and 1. Instead, for the bad images, the spacing is (1,1, >1) and with >1 I mean that the third component is approximately 2 (almost always) or exactly 3 in some cases. In both cases there are exceptions: images that have the "right" direction and/or spacing but for which moose execution fails

I tried to set the direction to (1, 0, 0 ,0 , -1, 0, 0, 0, 1), using SimpleITK and moose worked! At least for the majority of the images for which moose used to not work, since there are still some for which the three errors arises (I've seen all three of them)

I'm still currently running the predictions (again, manually sorting the "new good" and "new bad" subjects), so I don't know the precise number of the new good and bad subjects, but at least it worked.

To Reproduce Steps to reproduce the behavior:

Go to OSF | VerSe 2019 and download the database
Rename all the files to add the prefix CT_
Make a conda environment with python 3.10 conda create -n moosez python=3.10
Install moosez with pip: pip install moosez
Run it moose -d <folder_path> -m clin_ct_vertebrae and wait for it to fail

Conclusion I want to point out that I don't really need you to solve this issue: I just used moose for a little university exam and I think I won't use it anymore (just because I will finish my degree, not because something is wrong with the software). I wrote this issue only because I think it will be usefull for you in order to fix a bug, and for other if they encounter the same problem. For the same reason, I hope you will pardon me if I won't reply in short time to your answer (that I'm sure will come soon). I mean, for what it concerns me, you could close this issue straight away.

LalithShiyam commented 1 month ago

Hi @puccj, many thanks for the elaborate bug and also for finding your way around the bug - very helpful :)!

We will look into the issue. I have a feeling about why it failed, might be mainly because of the orientation. But let me be sure.

@Keyn34 @mprires: would you be able to take a crack at it? I am gone till next week.

LalithShiyam commented 1 month ago

@mprires @Keyn34 did you have a crack at this?

mordilos commented 1 month ago

Hello, I am having the same issue (list index out of range) when using both the cli tool and the Python API. After some debugging I found out that the problem is in the post processing function as @puccj mentioned. In the postprocess function, when trying to get the output files predicted_image = file_utilities.get_files(output_dir, '.nii.gz')[0] I get the list index out of range error because nnUNet did not produce any output in the appropriate directory. I have a feeling that nnUNet is not producing anything due to the lack of environmental variables but I might be totally wrong. After reading @puccj issue I have second thoughts for my reading of the problem. Any input on this would be appreciated!

Keyn34 commented 1 month ago

Hey @puccj and @mordilos,

It is correct that nnUnet is not producing any results, and I am currently looking into that. I could not replicate the error for myself for now, but I am trying to determine the differences.

As of now, it is likely that the IOFactory of SimpleITK can't handle the direction of the images correctly, as @puccj pointed out.

I will let you know ASAP!

LalithShiyam commented 1 month ago

@mordilos in your case, I think it might be the env variables. Would you be kind enough to let me know about the environment and relevant OS/hardware details so that we can help you better?

Lalith

mordilos commented 1 month ago

@LalithShiyam yes of course. I am using a jupyterhub instance with docker spawner so, the OS is Linux without cuda. I will try out the nnUNet instructions for setting up env variables from here and let you know.

LalithShiyam commented 1 month ago

@mordilos its definitely the environment variables. Make sure you source the file and test if you can see the env variables before running moose. keep me posted :)

mordilos commented 1 month ago

@LalithShiyam ok so I think it's not just the env variables.

▶ echo $nnUNet_raw
/opt/conda/envs/moose-env/models/nnunet_trained_models/

▶ echo $nnUNet_preprocessed
/opt/conda/envs/moose-env/models/nnunet_trained_models/

▶ echo $nnUNet_results
/opt/conda/envs/moose-env/models/nnunet_trained_models/

inside the nnunet_trained_models: Dataset123_Organs/nnUNetTrainer_2000epochs_NoMirroringnnUNetPlans3d_fullres/ the files that are automatically downloaded

but still, nnunet won't produce the outputs.

LalithShiyam commented 1 month ago

@mordilos strange. We don't have the same setup you have and it is hard for us to test it. Have you tried it without docker/jupyterhub setup? Just asking to figure out if you have the same error.

mordilos commented 1 month ago

@LalithShiyam I know, I will try to reproduce it on a native linux machine. I will keep you posted.

LalithShiyam commented 1 month ago

Sorry about not being helpful - keep me posted and we will figure this out @mordilos :)

ENHANCE-PET / MOOSE

BUG: list index out of range if image.direction is not (1,0,0,0,-1,0,0,0,1) #135