CityUHK-CompBio / DeepCC

DeepCC: a novel deep learning-based framework for cancer molecular subtype classification
https://CityUHK-CompBio.github.io/DeepCC/
MIT License
20 stars 16 forks source link

Calculation of enrichment score #18

Open kate-simonova opened 1 year ago

kate-simonova commented 1 year ago

Dear all,

according to the description in the paper, there is a step where log2 FC is calculated. I am curious how exactly LogFC is calculated when its able to process each sample separately (and the input is the sample TPM, not a list of differentially expressed genes). Does it calcuate differentially expressed genes from my sample to all other samples? I think it doesn't require indication of groups when TPM matrix is given to the input.

Thanks in advance for the response.

Kate

jpfry327 commented 4 months ago

The paper's method description of the multi-sample functional spectra calculation appears wrong. The code does not calculate LogFCs. Usually we rank genes by LogFCs between a phenotype of interest for GSEA and then calculate enrichments based on ranked LogFCs. This gives us an enrichment for each phenotype.

However, the method in this package is different in that it calculates an enrichment per sample. It does this by ranking genes by their z-score relative to other samples in your training set. It then calculates enrichments (in the usual GSEA way) based on this ranking, per sample.

There are other ways of calculating functional spectra. e.g. VIPER does something similar. You can also just sum up the z-scores of genes within a gene set and take the z-scores of that quantity. This is how, e.g., AR-activity is calculated. I use these other methods in addition to train the DeepCC model to see what works the best.