Regarding of missing data in repository

BostonGene / MFP

Mollecular Functional Portraits

Other

33 stars 13 forks source link

Regarding of missing data in repository #1

Open hmkim opened 3 years ago

hmkim commented 3 years ago

Hi!

Could you provide the some missing files in repository?

pan_ann.tsv
TCGA_signatures.tsv
TCGA_annotation.tsv

Thank you in advance!

geneprophet commented 2 years ago

I want the 'expression.tsv' file in 'https://github.com/BostonGene/MFP/blob/master/clustering_example.py#L13'

because I can not run the pipeline successfully using my own expression matrix file.

avkitex commented 2 years ago

You may find the data at https://science.bostongene.com/tumor-portrait/ in the downloads section

geneprophet commented 2 years ago

You may find the data at https://science.bostongene.com/tumor-portrait/ in the downloads section

Thank you very much! I have downloaded the file 'signatures-panmi.tsv' via the link.

However, I still can not run the 'clustering_example.py' by the 'signatures-panmi.tsv' to replace the 'signature_scores_scaled' in 'https://github.com/BostonGene/MFP/blob/master/clustering_example.py#L16'. I confirmed that python3.7 and packages referenced in the 'requirements.txt' have been configured correctly. And the errors was encounter in the function clustering_profile_metrics_plot() and the detect_type(). https://github.com/BostonGene/MFP/blob/master/clustering_example.py#L26 https://github.com/BostonGene/MFP/blob/master/clustering_example.py#L61

Could you please check the code by the 'signatures-panmi.tsv' as signature_scores or signature_scores_scaled ??

geneprophet commented 2 years ago

I run the ''clustering_example.py' successfully when I change the https://github.com/BostonGene/MFP/blob/master/clustering_example.py#L61 to final_clusters = detect_type(clustering_metrics.T[0.51].perc, signature_scores_scaled)

AGorthee commented 2 years ago

Hello! I am running into a few errors when I execute clustering_example.py: final_clusters = detect_type(clustering_metrics.loc[best_threshold]['perc'], signature_scores_scaled) The error I get is: KeyError: "None of [Index(['Angiogenesis', 'Endothelium', 'CAF', 'Matrix', 'Matrix_remodeling'], dtype='object')] are in the [index]"

I tried to debug by going through each step of the detect_type function and I realized that the cmeans data frame contains TCGA codes in rows and cluster numbers as columns and loses the Fges information. Hence the above error when the code tries to calculate deltas of subsets of the Fges.

Could you please check the code and tell me how to fix it? Thank you very much!