MIT-LCP / physionet

A collection of tools for working with the PhysioNet repository.
http://physionet.org/
MIT License
69 stars 17 forks source link

Cannot download MIMIC-CXR data #96

Closed xarion closed 5 years ago

xarion commented 5 years ago

Hello, I just got access to the MIMIC-II, MIMIC-III, eICU Collaborative Research Database, MIMIC-CXR datasets. When I try to download the contents of MIMIC-CXR dataset, I receive a Forbidden 403 error. I can download the contents of the other datasets. Thanks!

alistairewj commented 5 years ago

I just tested it and I am able to download, so this sounds like you haven't been given permission to access MIMIC-CXR. You may have to sign the DUA again. If you can access the files (i.e. you can see the https://physionet.org/works/MIMICCXR/files/ page) and you still have issues downloading, reach out to us at mimic-support@physionet.org as we may need to check your e-mail address is credentialed.

See below as the access pathway has changed.

xarion commented 5 years ago

I have tried it again and now it works. Thanks! Closing this.

nnajeh commented 3 years ago

@xarion please how did u download the dataset?

alistairewj commented 3 years ago

The most common issue is not specifically requesting access to the dataset via PhysioNet.

  1. Go to https://physionet.org/content/mimic-cxr/
  2. Sign the DUA and request access to the dataset.
  3. Once approved for access, download the dataset. We highly recommend downloading it via Google Cloud Platform. This requires you to add a GCP to your account. There is a tutorial here: https://mimic-cxr.mit.edu/about/download/
nnajeh commented 3 years ago

@alistairewj i have access to the dataset but i don't have a google cloud account, and the use of the dataset directly by download it is very hard because of its size. Any proposition?

alistairewj commented 3 years ago

Unfortunately we don't have the dataset in any other cloud platform. If you want to use the dataset in the cloud you'll have to create a Google account.

MehreenTabassum commented 1 year ago

@alistairewj I am having the access denied error while trying to do the training for the MIMIC-CXR-JPG dataset. help is highly appreciated MIMIC-CXR-JPG

tompollard commented 1 year ago

@MehreenTabassum As mentioned in our email conversation, this is not something we can help with. It looks like someone is running firewall software that is blocking your IP address. The CITI program website is independent of PhysioNet.

I would suggest either:

  1. reaching out using the support ticket link or contacting https://sucuri.net/ in some other way and/or
  2. contacting your network administrators to let them know about the issue and/or
  3. trying to connect from a different IP address (e.g. from a different location) or using a VPN.
Asaad-Pak commented 7 months ago

Hi, @alistairewj I have access to mimic the cxr dataset, but now I cannot find any tutorial or article on how to download it. Could you recommend any video tutorial or blog that can help me? I am new to this dataset and don't know much about it.

tompollard commented 7 months ago

@Asaad-Pak please review the posts at https://github.com/MIT-LCP/mimic-code/discussions for MIMIC support, and add a new post if you don't find an answer.

You should be able to download the MIMIC-CXR files from the files section of the project. e.g. for the JPG project, see: https://physionet.org/content/mimic-cxr-jpg/2.0.0/#files

Asaad-Pak commented 7 months ago

Hi @tompollard thanks for the answer. I really appreciate that. I just have one more problem which is that there are two versions of the dataset one is Mimic CXR in which Xrays are in dicom format and reports in txt format and the second is MIMIC CXR JPG in which I have only Xrays in JPG format but there are no reports in that dataset. Actually, I want to work on multimodal in which I need images in JPG and reports in txt format. Can you guide how can I do this? I mean is there any way that I can get reports and images in a single dataset? Is there any version like this? Or anything you can recommend to me.

DICOM: https://physionet.org/content/mimic-cxr/2.0.0/ JPG VERSION: https://www.physionet.org/content/mimic-cxr-jpg/2.0.0/

alistairewj commented 7 months ago

Easy - just download the reports ZIP file from mimic-cxr and download the JPGs from mimic-cxr-jpg. They have the same identifiers so you can match them.

jwan9 commented 7 months ago

Hi, @alistairewj @tompollard I'm trying to download the mimic-cxr-jpg datasets from the google cloud. I've been granted access to the dataset and configured the GCP. However, still got the error when tried to download via gsutil. Can you suggest how to resolve this? Thanks for the help!

2
alistairewj commented 7 months ago

It looks like you haven't configured a project ID for billing. We've provided you access to the dataset (stored under our project ID, physionet-data), but you need to set up your own project ID and associate it with your billing ID. This is because you are responsible for the cost of downloading the dataset. It costs around $50 for the entire MIMIC-CXR dataset (~5 TB). MIMIC-CXR-JPG is less (500 GB, so around $10-$15).

You can read how to set up a project here: https://cloud.google.com/storage/docs/projects

You only need to create a project for yourself, and then use gcloud auth login to configure that as your default project: https://cloud.google.com/sdk/gcloud/reference/auth

From then on all your usage will be billed against your default project

jwan9 commented 7 months ago

Thanks @alistairewj
Actually I've already created a new project and add a billing method. After installing Google cloud CLI, I set it up by gcloud init. As mentioned above in the picture, the command I used to download the datasets:

gsutil -m cp -r \
  "gs://mimic-cxr-jpg-2.0.0.physionet.org/files" \
  .  --billing-project= my_project_id

but still got the error BadRequestException: 400 Bucket is a requester pays bucket but no user project provided.

jwan9 commented 7 months ago

Thanks @alistairewj Actually I've already created a new project and add a billing method. After installing Google cloud CLI, I set it up by gcloud init. As mentioned above in the picture, the command I used to download the datasets:

gsutil -m cp -r \
  "gs://mimic-cxr-jpg-2.0.0.physionet.org/files" \
  .  --billing-project= my_project_id

but still got the error BadRequestException: 400 Bucket is a requester pays bucket but no user project provided.

error resolved by setting the option -u not --billing-project, not familiar with gsutil....

alistairewj commented 7 months ago

Not sure you can be completely at fault, I would have guessed --billing-project was a reasonable argument as well. Glad you figured it out!

tompollard commented 7 months ago

The following Collab notebook may also be helpful as a demo for working with the chest x-rays ("Training a Convolutional Neural Network to Classify Chest X-rays"): https://github.com/MIT-LCP/2019-hst-953/blob/master/tutorials/mimic-cxr/mimic-cxr-train.ipynb