MIC-DKFZ / nnUNet

Apache License 2.0
5.9k stars 1.76k forks source link

FileNotFoundError when verifying dataset integrity #2144

Closed yw7 closed 6 months ago

yw7 commented 6 months ago

I encountered a FileNotFoundError when running nnUNetv2_plan_and_preprocess with --verify_dataset_integrity. in: https://github.com/MIC-DKFZ/nnUNet/blob/5db96042779fe720dc6cef7ba4b32d2f9d127d31/nnunetv2/experiment_planning/verify_dataset_integrity.py#L206C17-L206C66 It seems that the code is trying to join the folder path with 'labelsTr' and each filename from labelfiles. However, this results in incorrect file paths and raises the FileNotFoundError. The issue is resolved by directly passing labelfiles instead of joining the paths:

zip(labelfiles, [reader_writer_class] * len(labelfiles), [expected_labels] * len(labelfiles))
anw1998 commented 6 months ago

I have a similar issue

RuntimeError: Exception thrown in SimpleITK ImageFileReader_Execute: [D:\a\1\sitk\Code\IO\src\sitkImageReaderBase.cxx:97](file:///D:/a/1/sitk/Code/IO/src/sitkImageReaderBase.cxx#line=96):
sitk::ERROR: The file "nnUNet_raw\Dataset005_BH163test\labelsTr\nnUNet_raw\Dataset005_BH163test\labelsTr\XXXX_012.nii.gz" does not exist.

The part nnUNet_raw\Dataset005_BH163test\labelsTr is repeated in the filename.

dojoh commented 6 months ago

Could you give me the commands you used? Did you setup your environment variables correctly? See https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/documentation/setting_up_paths.md

yw7 commented 6 months ago

I used this code with the path set as required:

nnUNetv2_plan_and_preprocess -d 101 -c 3d_fullres --verify_dataset_integrity

The problem arises from the way the file paths are constructed in the code. The dataset[k]['label'] is set to include the raw_dataset_folder.

https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/nnunetv2/utilities/utils.py#L58

These values are then included in the labelfiles list.

https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/nnunetv2/experiment_planning/verify_dataset_integrity.py#L186

Subsequently, the code attempts to join the folder path with the 'labelsTr' string and each filename from labelfiles.

https://github.com/MIC-DKFZ/nnUNet/blob/2d2e8ce2c0261dc88b53866dcc4c71e6972432ed/nnunetv2/experiment_planning/verify_dataset_integrity.py#L206

This results in incorrect and duplicated file paths, causing the FileNotFoundError.

The issue can be resolved by directly passing the labelfiles list instead of joining the paths, as shown in the following code snippet:

zip(labelfiles, [reader_writer_class] * len(labelfiles), [expected_labels] * len(labelfiles))

After I've applied this change, the code ran without errors, and the dataset integrity verification proceeded successfully.

FabianIsensee commented 6 months ago

The reason this slipped through our attention is that on Linux at least the duplication of the file path is ignored, so we never noticed... See this example:

In [9]: folder Out[9]: '/media/isensee/raw_data/nnUNet_raw/Dataset004_Hippocampus'

In [10]: i Out[10]: '/media/isensee/raw_data/nnUNet_raw/Dataset004_Hippocampus/labelsTr/hippocampus_001.nii.gz'

In [11]: join(folder, 'labelsTr', i) Out[11]: '/media/isensee/raw_data/nnUNet_raw/Dataset004_Hippocampus/labelsTr/hippocampus_001.nii.gz'

Thanks for bringing this to our attention! I fixed the problem :-)