DearCaat / RRT-MIL

[CVPR 2024] Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
85 stars 8 forks source link

Labels for cancer subtyping #8

Closed JackyC666 closed 7 months ago

JackyC666 commented 7 months ago

Hi, How did you label the BRCA dataset? In your paper,

TCGA-BRCA includes two sub-types of cancers, Inva-
sive Ductal Carcinoma (IDC) and Invasive Lobular Carci-
noma (ILC). There are 779 IDC slides and 198 ILC slides.

So, HOW can I REPEAT it? In the TCGA DATASET,I can`t found something about it! Thanks~~

DearCaat commented 7 months ago

I got these labels from the clinical.json file obtained from the GDC website, specifically:

Since I haven't found any other way to process this either, I'm not sure if this is entirely appropriate, so if you have a better idea, feel free to leave a comment.

JackyC666 commented 7 months ago

I got these labels from the clinical.json file obtained from the GDC website, specifically:

  • I first downloaded the clinical.json file under Diagnostic Slide of TCGA-BRCA project from the GDC official website.
  • Second, I obtained the case_id and primary_diagnosis fields for each case from that file.
  • Finally, I classified the primary_diagnosis field containing the IDC or ILC subtype keyword as either IDC or ILC. Thus, I got the BRCA labels that I am using now.

Since I haven't found any other way to process this either, I'm not sure if this is entirely appropriate, so if you have a better idea, feel free to leave a comment.

Thanks for your reply! where is the primary_diagnosis? Can you give me a link?
Or a screenshot to illustrate? Sincerely thank you for your help!

DearCaat commented 7 months ago

Sorry, it's a file named clinical.json, and it's also downloaded on GDC site. The praimary_diagnosis is one field of this json file, like this: image