LiQian-XC / sctour

A deep learning architecture for robust inference and accurate prediction of cellular dynamics
https://sctour.readthedocs.io
MIT License
51 stars 4 forks source link

calculating expression dynamics genes in Fig6E #7

Closed yao50985098 closed 9 months ago

yao50985098 commented 9 months ago

Hi, Dr Li, Thank you for your nice work which brought so much convenience. After calculating the ptime, I want to further explore the genes changing along with the ptime, is there a function and visualization for this? Thank you!

LiQian-XC commented 9 months ago

Hi, thanks for using scTour. Unfortunately, there is no such function in the current version of scTour to detect and visualise genes with dynamic changes along pseudotime. For Fig. 6E in the paper, I used external tool as described in the method 'Analysis of human skeletal muscle development'. I also attach my script to do this job which used the R package VGAM (please see below). For the visualisation, I used heatmap (please see attached script).

Hope this will be helpful. I hope to add these functions in the next release of scTour.

library(VGAM)
library(parallel)
#mat is the expression matrix with rows for genes and columns for cells
pvals <- mclapply(1:nrow(mat), function(i) {
    x <- mat[i, ]
    data <- data.frame(X=x, Y=ptime)
    full_model_fit <- vglm(X~sm.ns(Y, df=3), family='uninormal', epsilon=1e-1, data=data)
    reduced_model_fit <- vglm(X~1, family='uninormal', epsilon=1e-1, data=data)
    if(is.null(full_model_fit) == FALSE && is.null(reduced_model_fit) == FALSE){
        lrt <- lrtest(full_model_fit, reduced_model_fit)
        pval <- lrt@Body["Pr(>Chisq)"][2,]
    } else {
        pval <- 1
    }
    pval
}, mc.cores=8)
pvals <- unlist(pvals)
padj <- p.adjust(pvals, method='bonferroni')

human_skeletal_muscle_part2_heatmap.py.txt

yao50985098 commented 9 months ago

Thank you for the quick response and share. This helps a lot!

pandaqiuqiu commented 2 months ago

@LiQian-XC

Hi, Qian, To calculate genes with dynamic changes along pseudotime, should the expression matrix (mat) be the raw data matrix or the normalized expression matrix as below? mat = adata[:, ggs].X.copy() mat -= mat.min(0) mat /= mat.max(0) mat = mat.T

Could you give me some insights here? thanks.

LiQian-XC commented 2 months ago

Hi, Here the expression matrix (for example in adata.X) should be normalised expression.

pandaqiuqiu commented 2 months ago

@LiQian-XC

Thank you so much, Qian. So, the normalized expression comes from the normalized expression of highly variable genes, and the genes with dynamic changes along pseudotime come from the highly variable genes. Is this correct?

LiQian-XC commented 2 months ago

The normalised expression is usually log-transformed total-UMI normalised expression. For example, in Scanpy, you can get it from raw UMI counts of all genes by using the following functions:

sc.pp.normalize_total(adata, target_sum=1e4) sc.pp.log1p(adata)

Based on this, you can check the dynamic changes along pseudotime for any genes of your interest.

pandaqiuqiu commented 2 months ago

@LiQian-XC Thanks, Qian - that last example is really helpful.