baeseongsu / mimic-cxr-vqa

A new collection of medical VQA dataset based on MIMIC-CXR. Part of the work 'EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images, NeurIPS 2023 D&B'.
MIT License
62 stars 3 forks source link

Which compressed folder is the test set of MIMIC-CXR dataset? #2

Closed FRENKIE-CHIANG closed 5 months ago

FRENKIE-CHIANG commented 5 months ago

Hello, since the MIMIC-CXR dataset is too large to download, I only need the test set. Which compressed folder is the test set of this dataset?

baeseongsu commented 5 months ago

Hi, @FRENKIE-CHIANG. Thank you for using the MIMIC-CXR-VQA dataset! To access the test set of MIMIC-CXR-VQA dataset, please follow these steps:

  1. Construct the test.json File: Execute the script provided below. (For more details, refer to how to access)
bash download_and_build_dataset.sh
  1. Modify the download_images.sh Script: After obtaining the test.json file, you need to modify the download_images.sh script to ensure only the test set is downloaded. Considering the large scale of the MIMIC-CXR-JPG dataset you mentioned, this revision enables downloading a smaller subset of images. Modify the script as follows:

https://github.com/baeseongsu/mimic-cxr-vqa/blob/b177ce81bc37826b0c44bef9950a13ec9151fac5/download_images.sh#L32-L38

# Gather image paths from the test JSON dataset file
image_paths_test=$(get_image_paths 'mimiccxrvqa/dataset/test.json')

# Set the script to use only the test set image paths
image_paths=$(echo -e "$image_paths_test")

If you encounter any problems with these steps or experience any inconvenience, please let us know.

FRENKIE-CHIANG commented 5 months ago

Thanks for your reply. I have got the image data(MIMIC-CXR-JPG), but in this dataset, there are no txt files about the caption or VQA-test data for the images. I found that these txt files are in the MIMI-CXR dataset(not in MIMIC-CXR-JPG). But for some reasons, I cannot get the MIMIC-CXR-JPG now. I'm so sorry but I need the txt files corresponding to the images urgently. Could you please provide me with the corresponding txt file of images (including caption and VQA text), or provide me with a convenient download path, so that I can only download txt files, not image files. If you can help me, I will be most grateful to you, thank you so much!!!   ------------------ Original ------------------ From: @.>; Date:  Tue, Apr 2, 2024 01:46 PM To: @.>; Cc: @.>; @.>; Subject:  Re: [baeseongsu/mimic-cxr-vqa] Which compressed folder is the test set of MIMIC-CXR dataset? (Issue #2)

 

Hi, @FRENKIE-CHIANG. Thank you for using the MIMIC-CXR-VQA dataset! To access the test set of MIMIC-CXR-VQA dataset, please follow these steps:

Construct the test.json File: Execute the script provided below. (For more details, refer to how to access) bash download_and_build_dataset.sh

Modify the download_images.sh Script: After obtaining the test.json file, you need to modify the download_images.sh script to ensure only the test set is downloaded. Considering the large scale of the MIMIC-CXR-JPG dataset you mentioned, this revision enables downloading a smaller subset of images. Modify the script as follows:

(before)

https://github.com/baeseongsu/mimic-cxr-vqa/blob/b177ce81bc37826b0c44bef9950a13ec9151fac5/download_images.sh#L32-L38

(after)

Gather image paths from the test JSON dataset file image_paths_test=$(get_image_paths 'mimiccxrvqa/dataset/test.json') # Set the script to use only the test set image paths image_paths=$(echo -e "$image_paths_test")

If you encounter any problems with these steps or experience any inconvenience, please let us know.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

baeseongsu commented 5 months ago

Hi, @FRENKIE-CHIANG!


As these datasets are under the PhysioNet Credentialed Health Data License 1.5.0, please be aware that we cannot directly upload the entire dataset freely.