dylkot / cNMF

Code and example data for running Consensus Non-negative Matrix Factorization on single-cell RNA-Seq data
MIT License
243 stars 57 forks source link

How to assess the similarity between programs across samples #52

Closed Dragonlongzhilin closed 1 month ago

Dragonlongzhilin commented 1 year ago

Hi cNMF team, Thanks for developing a nice tools! I extracted different programs in serveral samples and want to assess the program similarity. I noticed there are two gene expression program matrix (Zscore and TPM). Which expression matrix is suitable for calculating the correlation between programs in different samples.

michelle-curtis commented 1 year ago

Hi, thanks for your comment. It is effective to use either the Z-score or TPM spectra matrix to compare correlation across programs. However, if using the TPM matrix, you will want to variance-normalize it to allow genes expressed at different scales to contribute equally to the program correlations - gene variances have already been output in the tpm_stats file. We also suggest using either the union or intersection of the HVGs across samples for these comparisons, especially when using the var-normed TPM-spectra. The TPM spectra matrix tends to have more baseline correlation across the non-differentially expressed genes (whereas these genes will tend to have zero values in the Z-score matrix).

dylkot commented 1 month ago

Yes, like @michelle-curtis said, either can work. I think using the Z-score output by default seems to work the best. And also as Michelle said, good to subset to high variance genes