AdalbertoCq / Histomorphological-Phenotype-Learning

Corresponding code of 'Quiros A.C.+, Coudray N.+, Yeaton A., Yang X., Chiriboga L., Karimkhan A., Narula N., Pass H., Moreira A.L., Le Quesne J.*, Tsirigos A.*, and Yuan K.* Mapping the landscape of histomorphological cancer phenotypes using self-supervised learning on unlabeled, unannotated pathology slides. 2024'
48 stars 12 forks source link

External cohort Pipeline Problem. #2

Closed wuyu-z closed 2 years ago

wuyu-z commented 2 years ago

Hello,@AdalbertoCq I am trying run through the pipeline of Mapping an external cohort to existing clusters. For simplicity I am using just one WSI image as the external cohort. On Step 4 including metadata in h5, I notice a csv file contains "luad", "os_event_ind", "os_event_data" column. I cannot create my own csv file, Where are these data come from? I secrch around and find nothing.

Might a silly question :)

Thank you

AdalbertoCq commented 2 years ago

Hey @wuyu-z , Those are the field related to each WSI or patient:

  1. luad: Binary variable, one for lung adenocarcinoma and zero for lung squamous cell carcinoma.
  2. os_event_ind: Binary variable, one if the patient died and zero if it was censored.
  3. os_event_data: Continuous variable in months. This is the time of the event, either time of death or censored time.

The values in the TCGA file come from the GDC website. This notebook shows how the survival data was processed from the raw TCGA values.

Hope this helps, Adal

wuyu-z commented 2 years ago

Thank for your help and question solved, and apologize for bothering again. I do have another question. On step 5 Background and artifact removal of external cohort, you mentioned a file hdf5_TCGAFFPE_LUADLUSC_5x_60pc_he_complete_lungsubtype_survival.h5. I understand this file is the output of normal HPL step 5 include metadata, but on the link you give it only has a post-filtered version. The link in Step 5 of TCGA tile vector representations is directing to nowhere. Can you upload a pre-filtered version?

Thank you Wuyu

AdalbertoCq commented 2 years ago

No problem @wuyu-z , the link should be fixed now referencing the TCGA tile vector representations in the Readme.md.

With respect to the unfiltered version, it will take a couple of days to upload that file but you should have it by Friday end of the day.

Thanks, Adal

AdalbertoCq commented 2 years ago

Hey @wuyu-z ,

You should be able to find the file hdf5_TCGAFFPE_LUADLUSC_5x_60pc_he_complete_lungsubtype_survival.h5 with the unfiltered background and artifact on the Reame.md.

Thanks, Adal

wuyu-z commented 2 years ago

Hi @AdalbertoCq, Now my purpose is try to create a single given WSI image with HPC cluster overlay, like what you showed in the paper.

capture

Now I have done up to step 7 in the external cohort, and I now obtain csv files that a number assigned for each tile. I am assuming that is cluster id of that tile.

1661766191586

Now I am losing track of what to produce the HPC cluster overlay image. What are the following procedure to produce the images?

Thank you very much Wuyu

AdalbertoCq commented 2 years ago

Hey @wuyu-z ,

I have included Get tiles and WSI samples for HPCs in the README_additional_cohort.md.

Right now, this step returns WSIs based on cluster contribution % or random selection from the total. If you want to create overlays for a specific slide, you will have to tinker with the code a bit. In line 391 of clusters.py, you should be able to modify the slide variable to be the specific name of your slide.

Thanks, Adal