HKU-MedAI / WSI-HGNN

[CVPR'23] Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation Learning
66 stars 6 forks source link

Asking about labeling WSIs in TCGA-BRCA #10

Closed simonhfut closed 5 months ago

simonhfut commented 6 months ago

I'd like to ask about labeling WSIs in TCGA-BRCA. There are 11765 svs files in TCGA-BRCA. Are their labels obtained from the 'sample_type' column in the nationwidechildrens.org_biospecimen_sample_brca.txt? However, this file only contains 1150 labels. Could you provide some guidance? Thank you very much!

howardchanth commented 6 months ago

Hi thank you for your interest in our work. Yes for cancer classification we use the sample_type column from the nationwidechildrens.org_biospecimen_sample_brca.txt. In fact, this txt file is updated in the TCGA repository which can be downloaded as biospecimen.project-tcga-brca.{year}-{month}-{day}.tar.gz, where the sample information can be found in the sample.tsv file. We only include the 1150 samples as the labelled samples, while we discard those unlabelled samples during training. However, it is possible to impose some assumptions so that an unsupervised likelihood can be calculated for those unlabelled samples, which was beyond the scope of our work. Please let me know if there are any further problems. Thank you!