binli123 / dsmil-wsi

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image
MIT License
378 stars 88 forks source link

TCGA Dataset Training and Testing Distributions #88

Open bryanwong17 opened 10 months ago

bryanwong17 commented 10 months ago

Hi, could you please share with me the distribution of slides used for training and testing in the TCGA dataset, along with their respective labels?

I noticed that it's mentioned here "We randomly split the WSIs into 840 training slides and 210 testing slides (4 low-quality corrupted slides are discarded)". However, upon examining the TEST_ID.csv file from this link, I observed that there are 214 testing slides. Could you provide clarification which slides were discarded? And also which slides are used for training? Thank you!

GeorgeBatch commented 9 months ago

@bryanwong17, I went through this. See the results of my investigation in my README file for downloading TCGA.