Biooptics2021 / PathFinder

GNU General Public License v3.0
45 stars 4 forks source link

Doubt in "Generate_prognostic_patches.py" file #5

Closed Rukhmini closed 1 year ago

Rukhmini commented 1 year ago

Hello, For the Generate_prognostic_patches.py file, is the heatmap.npy file generated from cut_heatmap.py file? What is the input heatmap file for this line "WSI_heatmap_path = 'xx'+WSI_path.split('/')[-1][:-4]+'.npy'"? Thanks!

LiangJunhao-THU commented 1 year ago

Generate_prognostic_patches.py is made for generating 512*512 pixel patches to train MicroNet (which is a prognostic CNN based on tumor patches, you can find the detail description in the paper ). To achieve right localization, the heatmap.npy used in here is raw and not cut in to square, which is a rectangle like the corresponding WSI.

cut_heatmap.py is made for training MacroNet, which cut the raw heatmap.npy into square. So we only use the square heatmap.npy when we need to train MacroNet.

The heatmap.npy you asked here is the raw numpy array generated by decoupling.py

Hope this will be helpful :)

Rukhmini commented 1 year ago

Thank you again for your response. That means after the raw numpy array is generated from "decoupling.py", if I want to train Macronet for survival analysis, I will run "train_TCGA_CV.py" directly which will generate the Kaplan-Meier analysis.

LiangJunhao-THU commented 1 year ago

You need to cut heatmap.npy into square when you train MacroNet.

To train MacroNet: decoupling.py -> raw heatmap.npy -> cut_heatmap.py -> square heatmap.npy -> train_TCGA_CV.py(select MacroNet) -> risk score

To train MicroNet: decoupling.py -> raw heatmap.npy -> Generate_prognostic_patches.py -> patches -> train_TCGA_CV.py(select MicroNet) -> risk score

If you have any other questions, please let me know ;)

Rukhmini commented 1 year ago

Thank you. Another doubt is that in the train_TCGA_CV.py file, in line number 26. "TCGA_HCC_list = list(pd.read_csv(TCGA_HCC_path)['WSIs'])", do you want to input the WSI IDs provided in the TCGA.csv file in the column "WSI_name_x", then I will change "pd.read_csv(TCGA_HCC_path)['WSIs'])" to "pd.read_csv(TCGA_HCC_path)['WSI_name_x'])". Also, what is the difference between TCGA_HCC_list and clinical_list?

LiangJunhao-THU commented 1 year ago

Thank you for your question. That's just a data prepare process to make sure each heatmap.npy has corresponding survival information. The aim of path_cleaning function is to get a final list (cleaned_path) for cross validation. You can get the cleaned_path in your way, just make sure the data in the list has both heatmap.npy and clinical information is OK. Also you can directly use 'WSI_name_x', it depends on the clinical information excel you have.

I write the function because the raw data also has some other cancer types in the beginning, so I need to clean the information, which is also the difference between TCGA_HCC_list and clinical_list. You can just skip this function if your data is clean :)