fepegar / torchio

Medical imaging toolkit for deep learning
https://torchio.org
Apache License 2.0
2.08k stars 240 forks source link

Cannot download torchio.datasets.rsna_miccai.RSNAMICCAI dataset #897

Closed kiristern closed 2 years ago

kiristern commented 2 years ago

Is there an existing issue for this?

Problem summary

Unable to download the dataset.

Code for reproduction

Follow code exactly as is from kaggle using:


root_dir = '/kaggle/input/rsna-miccai-brain-tumor-radiogenomic-classification'
dataset = tio.datasets.RSNAMICCAI(root_dir)
len(dataset)

nor does it work when following TorchIO example:

import torchio as tio
from subprocess import call
call('kaggle competitions download -c rsna-miccai-brain-tumor-radiogenomic-classification'.split())
root_dir = 'rsna-miccai-brain-tumor-radiogenomic-classification'
train_set = tio.datasets.RSNAMICCAI(root_dir, train=True)
test_set = tio.datasets.RSNAMICCAI(root_dir, train=False)
len(train_set), len(test_set)

Actual outcome

Nothing

Error messages

/usr/local/lib/python3.7/dist-packages/torchio/datasets/rsna_miccai.py:86: UserWarning: Labels CSV not found. Ignoring MGMT labels
  warnings.warn('Labels CSV not found. Ignoring MGMT labels')
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
[<ipython-input-21-9d2ddfa68866>](https://ojoobu4k6j-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220603-060044-RC00_452738770#) in <module>()
----> 1 tio.datasets.RSNAMICCAI(root_dir)

2 frames
[/usr/lib/python3.7/pathlib.py](https://ojoobu4k6j-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220603-060044-RC00_452738770#) in iterdir(self)
   1105         if self._closed:
   1106             self._raise_closed()
-> 1107         for name in self._accessor.listdir(self):
   1108             if name in {'.', '..'}:
   1109                 # Yielding a path object for these makes little sense

FileNotFoundError: [Errno 2] No such file or directory: '/content/rsna-miccai-brain-tumor-radiogenomic-classification/train'

Expected outcome

get dataset

System info


import re
import sys
import platform
import torchio
import torch
import numpy
import SimpleITK as sitk

sitk_version = re.findall('SimpleITK Version: (.*?)\n', str(sitk.Version()))[0]

print('Platform:  ', platform.platform())
print('TorchIO:   ', torchio.__version__)
print('PyTorch:   ', torch.__version__)
print('SimpleITK: ', sitk_version)
print('NumPy:     ', numpy.__version__)
print('Python:    ', sys.version)```
fepegar commented 2 years ago

Hi, @kiristern.

nor does it work when following TorchIO example:

As explained in that link, you need to download the dataset first:

This is a helper class for the dataset used in the RSNA-MICCAI Brain Tumor Radiogenomic Classification challenge hosted on kaggle. The dataset must be downloaded before instantiating this class (as opposed to, e.g., torchio.datasets.IXI).

kiristern commented 2 years ago

My bad, glossed over that. Found a work around.. i don't want to download the full dataset!

fepegar commented 2 years ago

Good! Can you please share the workaround?

kiristern commented 2 years ago

Well, in the end i just ended up using the FPG dataset, haha. But that was after I investigated the following ways to download files off kaggle:

  1. Download an individual file with: !kaggle datasets download -f BraTS2020_TrainingData/MICCAI_BraTS2020_TrainingData/BraTS20_Training_001/BraTS20_Training_001_t2.nii awsaf49/brats20-dataset-training-validation

  2. Download all files from a specific folder in Kaggle: !kaggle competitions files -c rsna-miccai-brain-tumor-radiogenomic-classification | grep T2w | awk '{print $1}' | while read x ; do kaggle competitions download -f $x rsna-miccai-brain-tumor-radiogenomic-classification -p train/00000/T2w ; done

fepegar commented 2 years ago

Looks like you got it sorted then! Thanks for sharing those commands.