cvlab-stonybrook / PromptMIL

Repository for "Prompt-MIL: Boosting Multi-Instance Learning Schemes via Task-specific Prompt Tuning" (MICCAI2023)
16 stars 0 forks source link

Dataset #1

Closed SudaLiruiqi closed 11 months ago

SudaLiruiqi commented 11 months ago

Hello, could you please provide the dataset? thanks.

jingweizhang-xyz commented 11 months ago

You can download them from their original website: TCGA-BRCA: https://portal.gdc.cancer.gov/projects/TCGA-BRCA TCGA-CRC: This is a combination of COAD and READ. https://portal.gdc.cancer.gov/projects/TCGA-COAD https://portal.gdc.cancer.gov/projects/TCGA-READ Bright: https://www.synapse.org/#!Synapse:syn26480664/files/

SudaLiruiqi commented 11 months ago

Thank you, I noticed the CSV file with the "wsi_id." Have you renamed the images?

jingweizhang-xyz commented 11 months ago

wsi_id are the IDs listed in the TCGA dataset. In some cases, it will have a longer file name like "TCGA-BH-A0DL-11A-01-BSA.265a63aa-9dd5-4d64-9e8b-29c534306433.svs". I rename it to the name before "." to be ""TCGA-BH-A0DL-11A-01-BSA.svs" for simplicity.

jingweizhang-xyz commented 11 months ago

The patch files belonging to this WSI are renamed as, e.g. TCGA-BH-A0DL-11A-01-BSA_0.jpg TCGA-BH-A0DL-11A-01-BSA_1.jpg ...

SudaLiruiqi commented 11 months ago

Thank you.

thomascong121 commented 8 months ago

Hi Jingwei, Thanks for the great work, I checked 'https://portal.gdc.cancer.gov/projects/TCGA-BRCA', for cases involve 'ductal and lobular neoplasms', there are 1054 cases in total, which is different from the value 1034 you mentioned in the paper, can you tell me if I am looking at something wrong?

jingweizhang-xyz commented 8 months ago

The TCGA dataset updates every year, so it is common if you found the number on the website is higher than that in our paper. Also, we will remove the slides without any magnification information.

thomascong121 commented 8 months ago

The TCGA dataset updates every year, so it is common if you found the number on the website is higher than that in our paper. Also, we will remove the slides without any magnification information.

Thanks a lot for the explanation, I do have a further question about downloading. To download data from tcga, have you used the official gdc_client and downloaded using the manifest file?

jingweizhang-xyz commented 8 months ago

No, I did not. We have the entire TCGA ready on our server and we used it directly.

thomascong121 commented 8 months ago

No, I did not. We have the entire TCGA ready on our server and we used it directly.

Wow, that is very lucky, anyway thanks a lot for your reply!

bryanwong17 commented 3 months ago

Hi @jingweizhang-xyz @thomascong121 , could you give me the guidance on how to download TCGA dataset from the portal?