Project-MONAI / research-contributions

Implementations of recent research prototypes/demonstrations using MONAI.
https://monai.io/
Apache License 2.0
1.02k stars 334 forks source link

gzip.BadGzipFile: CRC check failed 0x8a39933f != 0x65ede34c #180

Open 15858805253yzl opened 1 year ago

15858805253yzl commented 1 year ago

Traceback (most recent call last): File "main.py", line 236, in main() File "main.py", line 104, in main main_worker(gpu=0, args=args) File "main.py", line 211, in main_worker accuracy = run_training( File "/home/hf524/lz/SwinUNETR.../BRATS21/trainer.py", line 157, in run_training train_loss = train_epoch( File "/home/hf524/lz/SwinUNETR.../BRATS21/trainer.py", line 32, in train_epoch for idx, batch_data in enumerate(loader): File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next data = self._next_data() File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data return self._process_data(data) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data data.reraise() File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise raise self.exc_type(msg) RuntimeError: Caught RuntimeError in DataLoader worker process 1. Original Traceback (most recent call last): File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/transforms/transform.py", line 91, in apply_transform return _apply_transform(transform, data, unpack_items) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/transforms/transform.py", line 55, in _apply_transform return transform(parameters) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/transforms/io/dictionary.py", line 154, in call data = self._loader(d[key], reader) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/transforms/io/array.py", line 266, in call img_array, meta_data = reader.get_data(img) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/data/image_reader.py", line 942, in get_data data = self._get_array_data(i) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/data/image_reader.py", line 1016, in _get_array_data _array = np.array(img.get_fdata(dtype=self.dtype)) File "/home/hf524/.local/lib/python3.8/site-packages/nibabel/dataobj_images.py", line 355, in get_fdata data = np.asanyarray(self._dataobj, dtype=dtype) File "/home/hf524/.local/lib/python3.8/site-packages/nibabel/arrayproxy.py", line 391, in array arr = self._get_scaled(dtype=dtype, slicer=()) File "/home/hf524/.local/lib/python3.8/site-packages/nibabel/arrayproxy.py", line 358, in _get_scaled scaled = apply_read_scaling(self._get_unscaled(slicer=slicer), scl_slope, scl_inter) File "/home/hf524/.local/lib/python3.8/site-packages/nibabel/arrayproxy.py", line 332, in _get_unscaled return array_from_file(self._shape, File "/home/hf524/.local/lib/python3.8/site-packages/nibabel/volumeutils.py", line 523, in array_from_file n_read = infile.readinto(data_bytes) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/gzip.py", line 292, in read return self._buffer.read(size) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/_compression.py", line 68, in readinto data = self.read(len(byte_view)) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/gzip.py", line 470, in read self._read_eof() File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/gzip.py", line 516, in _read_eof raise BadGzipFile("CRC check failed %s != %s" % (hex(crc32), gzip.BadGzipFile: CRC check failed 0x8a39933f != 0x65ede34c

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/transforms/transform.py", line 91, in apply_transform return _apply_transform(transform, data, unpack_items) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/transforms/transform.py", line 55, in _applytransform return transform(parameters) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/transforms/compose.py", line 173, in call input = apply_transform(transform, input, self.map_items, self.unpack_items, self.log_stats) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/transforms/transform.py", line 118, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.io.dictionary.LoadImaged object at 0x7fc05368a0d0>

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop data = fetcher.fetch(index) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/data/dataset.py", line 105, in getitem return self._transform(index) File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/data/dataset.py", line 91, in _transform return apply_transform(self.transform, data_i) if self.transform is not None else data_i File "/home/hf524/anaconda3/envs/yzl/lib/python3.8/site-packages/monai/transforms/transform.py", line 118, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x7fc05368a490>

tangy5 commented 1 year ago

This error means some data are corupted, you could check whether the NIFTI zip file can be loaded to numpy array.

ShanshanSoong commented 8 months ago

This error means some data are corupted, you could check whether the NIFTI zip file can be loaded to numpy array.

Hello, I encountered the same issue. I have checked the data and there is no problem; it can be loaded and processed. What's more, It works normally with nnUNet and other frameworks as well, and using the UNETR network under the MONAI framework also functions without issues. However, when I switch the network to SwinUNETR, the above error occurs. Could you please give me some advice?

ericspod commented 7 months ago

This does appear to be an issue with the data, the exception (if yours is the same) originates deep inside the Nibabel library. If you are encountering an issue from the same place in the Nibabel library I can only suggest that you check your data again to be sure it isn't corrupted, that is find which image the error occurs on and try to load it manually with Nibabel.

EloySchultz commented 7 months ago

I have been having the same issue, but with a different model and a different private dataset. My dataset is .nii.gz. When I load each individual file with nibabel I have no issues.

When I load all files with MONAI during training, it successfully loads all images for 181 epochs, and then it crashes with the CRC error. I have retried this experiment, and it consistently crashes at epoch 182

What is strange to me is that it seems that the first 181 times the loading of the nifti files goes without issues, but then the 182nd time it crashes. To me this tastes like a memory leak, but I do not observe a significant increase in memory consumption during those 182 epochs.

I am truly puzzled by this CRC error. Any ideas would be appreciated, though I understand that for you this is extremely hard to debug (and for me it is too, those 181 epochs take 1.5 days to complete)

ericspod commented 7 months ago

This does sound like a fault some place else in your setup, that is in Python, Nibabel, Pytorch (when we convert loaded arrays to tensors) and not really MONAI, or actual issues with memory or other hardware fault. Is this data being loaded with the Dataset class or one of the persistent/caching dataset classes? If it's just Dataset then it's not any of the optimisation mechanisms we have with data. I can only suggest to create a test script which just goes through your data creating batches as normal but then doing no training/evaluation with it. This would be much faster and if there's an issue with memory or something in MONAI it may help isolate it. If it consistently still produces the error then eliminating other variables such as transforms one by one can reduce the possible sources of error. I haven't seen this sort of error before so I can only recommend this experimental approach.