dylkot / cNMF

Code and example data for running Consensus Non-negative Matrix Factorization on single-cell RNA-Seq data
MIT License
243 stars 57 forks source link

Log Transformation #44

Closed Sayyam-Shah closed 1 year ago

Sayyam-Shah commented 1 year ago

Hello,

Thank you for the amazing tool!

Will I see substantially worse results if I input log transformed tpm data?

dylkot commented 1 year ago

Apologies for the slow reply! I haven't looked in detail but in general I would expect yes. The method expects count data as the main input (used for cNMF) and optionally TPM input. I think with log transformed TPM data, the high variance gene detection would be off so you might want to input a list of your own HVGs as well. Also, the resulting program in TPM spectra would not be in the expected units, although I'm not sure if they would be less interpretable. Let me know how it looks. And sorry again for the slow response.

Sayyam-Shah commented 1 year ago

Hello @dylkot,

Thank you for getting back to me! That makes sense. I have been inputting the counts from the data slot of sctransform version 2 and observed amazing results. The data slot contains the corrected counts and is analogous to log normalized RNA counts. I'm not exactly sure what the distribution looks like but my general understanding is it is derived from the pearson residuals. I tested the counts slot from sctransform and got poor results based on the top genes (A lot of ribosomal and mitochondrial genes).

Thank you so much for the tool. I have not input my own list of HVGs but I'll consider it. I have still clustered using the HVGs from sctransform rather than cnmf. Will inputting my own list of HVGs potentially improve the signatures?