Question on path setting and data preparation

JunMa11 commented 5 years ago

Dear DKFZ,

Thanks for the great respo. I have some problems on the path setting.

Enviroment: linux, pytorch 1.0 The installing works well.

Data preparation. I download the Task04_Hippocampus dataset from the medical segmentation decathlon, and put it into path/nnUNet/nnunet.

Step 1. Set base = path/nnUNet/nnunet/Task04_Hippocampus. Step 2. Run python experiment_planning/plan_and_preprocess_task.py -t Task04_Hippocampus, following error occurred:

Traceback (most recent call last):
  File "experiment_planning/plan_and_preprocess_task.py", line 18, in <module>
    from nnunet.paths import splitted_4d_output_dir, cropped_output_dir, preprocessing_output_dir, raw_dataset_dir
  File "/path/nnUNet/nnunet/paths.py", line 51, in <module>
    network_training_output_dir = os.path.join(os.environ['RESULTS_FOLDER'], my_output_identifier)
  File "/home/jma/anaconda3/envs/torch10/lib/python3.6/os.py", line 669, in __getitem__
    raise KeyError(key) from None
KeyError: 'RESULTS_FOLDER'

At the same time, two folders ( nnUNet_raw and nnUNet_raw_splitted) are generated in path/nnUNet/nnunet/Task04_Hippocampus. I modify network_training_output_dir as

network_training_output_dir = os.path.join(base, my_output_identifier)

Step 3. Besides, I put the Task04_Hippocampus dataset into path/nnUNet/nnunet/Task04_Hippocampus/nnUNet_raw/ and path/nnUNet/nnunet/Task04_Hippocampus/nnUNet_raw_splitted/ but a new error occurred:

Traceback (most recent call last):
  File "experiment_planning/plan_and_preprocess_task.py", line 253, in <module>
    crop(task, override=override, num_threads=processes)
  File "experiment_planning/plan_and_preprocess_task.py", line 131, in crop
    imgcrop.run_cropping(lists, overwrite_existing=override)
  File "path/nnUNet/nnunet/preprocessing/cropping.py", line 203, in run_cropping
    p.map(self._load_crop_save_star, list_of_args)
  File "/home/jma/anaconda3/envs/torch10/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/jma/anaconda3/envs/torch10/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
RuntimeError: Exception thrown in SimpleITK ReadImage: /tmp/SimpleITK/Code/IO/src/sitkImageReaderBase.cxx:99:
sitk::ERROR: The file "path/nnUNet/nnunet/Task04_Hippocampus/nnUNet_raw_splitted/Task04_Hippocampus/imagesTr/hippocampus_367_0000.nii.gz" does not exist.

Question:

Besides base, do we need to set preprocessing_output_dir and network_training_output_dir?
Task04_Hippocampus is a 3D dataset, why does split_4d run?
Do we need to convert the data patientID.nii.gz in Task04_Hippocampus to patientID_0000.nii.gz?

I also read the introduction in challenge_dataset_conversion. It well describes how to convert personal dataset to make it compatible with nnU-Net, especially for multi-modality data. nnU-Net was initially developed for MSD challenge, It would be better provide an example for MSD dataset, too. I recommend Task04_Hippocampus, because this dataset is very small.

FabianIsensee commented 5 years ago

Hi Jun Ma, you are right - the documentation of how to run the Decathlon datasets is incomplete. I will fix this some time later today. Right now, the decathlon dataset preprocessing pipeline requires FSL for the splitting of modalities. You can, however, do the splitting manually as well to skip that step - then you don't need FSL. base is the base folder for the raw data. In base, nnU-net will create three subdirectories: nnUNet_raw (here you put the downloaded Decathlon datasets), nnUNet_raw_splitted (here nnU-net will save the splitted data. If you are doing the splitting yourself then put the splitted data in here) and nnUNet_raw_cropped (don't touch this). You also have to set preprocessing_output_dir, otherwise nnunet will not know where to save preprocessed data. network_training_output_dir: you need set this as well.

Task04_Hippocampus is a 3D dataset, why does split_4d run?

splitting this won't do anything, so why not? In the decathlon I didn't know what I would be getting in phase II so I just run this for all datasets

Do we need to convert the data patientID.nii.gz in Task04_Hippocampus to patientID_0000.nii.gz?

If you let nnU-Net do everything (including splitting) then you don't have to do that. It will do it will do it for you. If you do the splitting manually, then you need to set the names with the _0000.nii.gz suffix

I hope this helps! Best, Fabian

JunMa11 commented 5 years ago

Hi Fabian,

Thanks for your help. Please give me few days to run the experiments again. If it works well, I will close this issue.

Best, Jun

MIC-DKFZ / nnUNet

Question on path setting and data preparation #1