alecbarrett / Prop2Count

An R package which uses a logit model to denoise single cell aggregated counts (pseudobulk) using the proportion of non-zero cells per gene
MIT License
0 stars 0 forks source link

turn pseudobulk from scRNAseq with logCPM value back to a seurat object with counts value? #2

Open jiangzh-coder opened 1 year ago

jiangzh-coder commented 1 year ago

May i ask 1 question:

if data is already pseudobulk object from scRNAseq data with logCPM value, how can i change it back to a seurat object with counts value? Can i still use above method to turn data back to a seurat object?

My data is normalized to become a pseudobulk data as following: "Normalizing count data After excluding poor quality cells, we normalized the sequencing depth of each cell by dividing each cell’s counts by the total counts in that cell, resulting in a matrix where the entries represent the proportion of a cell’s reads allocated to each gene (i.e. values in the range [0,1]). To estimate a library size for each dataset, we summed the total counts in each cell, and then we took the median as the library size for the dataset. Next, we multiplied the proportions by the library size to get a count matrix that was normalized for sequencing depth. Finally, we transformed the normalized count matrix with log2(1 + count). We referred to this log-transformed quantity in the figures as log2CPM.

We created pseudobulk expression (L. Lun, Bach, and Marioni 2016) for the cells in a cluster for each donor such that the pseudobulk matrix had one row for each gene and one column for each cluster from each patient. We normalized the pseudobulk counts to log2CPM as described in the previous section. Then we use limma::lmFit() to test for differential gene expression with the log2CPM pseudobulk matrix (Ritchie et al. 2015). We also use presto::wilcoxauc() to compute the area under receiver operator curve (AUROC or AUC) for the log2CPM value of each gene as a predictor of the cluster membership for each cluster (Korsunsky, Nathan, et al. 2019). " thanks best wishes J.

jiangzh-coder commented 1 year ago

How can i assign cell subtype to cluster defined by this pseudobulk matrix with logCPM value, after subset major celltypes defined by this pseudobulk matrix? Could i use markers expression profile in different clusters to redefine cell subtypes? thanks lot! best wishes,jiang