binli123 / dsmil-wsi

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image
MIT License
332 stars 84 forks source link

TCGA-lung: please share the training-validation / test split #67

Closed GeorgeBatch closed 4 months ago

GeorgeBatch commented 1 year ago

Hi Bin,

I got the summary of what you kindly made available:

Can you please share which subset of the patches (which WSIs) you used to train the Embedder? I would like to try a different aggregation model on the same TCGA multiscale features. But it would only make sense to train-validate the model on the same train-validation set as you did and test the model on a separate test set since, as I understand, during the SimCLR pre-training, you have exposed the embedder only to the train-validation portion of the TCGA and kept the test set truly unseen even to an embedder.

Many thanks, George

GeorgeBatch commented 1 year ago

Downloading the zip files:

When downloading the TCGA Multiscale features, I clicked "download" from Chrome. The zip file, however, was downloaded as 13 different zip archives for me. It was too big to download all at the same time.

Download TCGA multiscale patches failed for me. It is 60GB, and the download was interrupted every time I tried to download it.

da-nial commented 7 months ago

Hi @GeorgeBatch Hope you're doing well. Did you by any chance find out the train/validation/test split that was used to train the TCGA embedder? I contacted Bin, but unfortunately, they don't remember this as it was far in the past.

Thanks in advance.

GeorgeBatch commented 4 months ago

Hi @da-nial, unfortunately, I got the same reply. I am working under the assumption that the test set specified here was not used in the self-supervised training. So, I used the same split in my experiments when using the self-supervised feature extractor.