Hanminghao / MSCPT

The official pytorch implementation for MSCPT.
0 stars 0 forks source link

Manifest File for Dataset Download #1

Closed bryanwong17 closed 2 hours ago

bryanwong17 commented 4 hours ago

Hi, thank you for your great work! I was wondering if you could share the manifest file for automatic download from the terminal for both TCGA BRCA and TCGA RCC. Alternatively, could you guide me on how to obtain the file? I would also appreciate it if you could provide the label distribution. Below is an example of the manifest file

id  filename    md5 size    state
83fecc98-336d-49c1-92d3-c4db2d1acc0c    TCGA-64-5778-01Z-00-DX1.96C39819-8A65-4651-BE83-39959F6FAD05.svs    64a0a6c2972f76e6c58381520c255c40    326881061   released
4ea3a33b-d443-4b07-8e99-7874495d74d7    TCGA-55-8087-01Z-00-DX1.548f2800-8caf-4c0e-a7b5-6d3d28315d9c.svs    4c2f5862a9cd32bdf0e2bbceb3b36c26    189766883   released
e41e08ca-4b08-4786-a8c8-2ff3fb82b153    TCGA-MP-A4T9-01Z-00-DX1.F7B341C4-EBCD-455F-BE90-3B77AC6B76EC.svs    e7336fa6fb235ade271cc4e1a7bd3b54    793256747   released
Hanminghao commented 3 hours ago

You can download all diagnostic slides of TCGA-RCC and TCGA-BRCA from GDC and then read the files from CSV files in the numshots folder.

bryanwong17 commented 2 hours ago

Thanks! I was wondering if you have the complete list of all WSIs, including the labels for each dataset (not in the few-shot settings). Thank you!

Hanminghao commented 2 hours ago

Sorry, I don't have a complete manifest file because the data was downloaded so long ago, but all WSI names and categories used in this article are in tcga_brca.csv, tcga_lung.csv and tcga_rcc.csv files.

bryanwong17 commented 2 hours ago

Thank you for your answer! I have one more question regarding the number of patches at 20x magnification per WSI. Is it correct that a single WSI could exceed 20k patches? I followed the same settings as the CLAM preprocessing code, but the number of patches seems much higher compared to other reported papers

Hanminghao commented 2 hours ago

Yes, the number of patches of some WSI at 20 magnification will indeed exceed 20k in part. I guess there is no problem with your operation.

bryanwong17 commented 2 hours ago

Got it. Thank you for the confirmation!