Closed crazyhottommy closed 5 years ago
I read the Mol Cell paper and see the co-accessibility score is from 0 to 1 as you are calculating the correlation. Not sure where did the Seurat group get the data.
I went to http://krishna.gs.washington.edu/content/members/ajh24/mouse_atlas_data_release/activity_score_matrices/ . and saw there are activity scores that are binarized and quantitative.
In the tutorial, you binarized the matrix
# read in matrix data using the Matrix package
indata <- Matrix::readMM("filtered_peak_bc_matrix/matrix.mtx")
# binarize the matrix
indata@x[indata@x > 0] <- 1
If I want to get the quantitative gene activity score, should I not binarize it?
Thanks.
No, the default output explained in the tutorial will be the quantitative scores. We generally binarize the input matrix because given the expected sparsity of the data and the fact that there should generally be only two possible reads from a given site (diploid genome), we expect most values > 0 to be missed duplicates.
On Tue, Apr 2, 2019 at 1:09 PM Ming Tang notifications@github.com wrote:
I went to http://krishna.gs.washington.edu/content/members/ajh24/mouse_atlas_data_release/activity_score_matrices/ . and saw there are activity scores that are binarized and quantitative.
In the tutorial, you binarized the matrix
read in matrix data using the Matrix packageindata <- Matrix::readMM("filtered_peak_bc_matrix/matrix.mtx") # binarize the matrixindata@x[indata@x > 0] <- 1
If I want to get the quantitative gene activity score, should I not binarize it?
Thanks.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cole-trapnell-lab/cicero-release/issues/23#issuecomment-479099106, or mute the thread https://github.com/notifications/unsubscribe-auth/AHJQ138wxIoBvFyW0e37bUQTAawSFiZSks5vc463gaJpZM4cRhEr .
Thanks. I followed the tutorial exactly and used the 10x pbmc 10k data, I then checked the final normalized gene activity score matrix, and it ranges from 0-1.
could you please confirm the range or distribution of the gene activity score as shown in the histogram in my previous message?
Seurat V3 is using the counts in the genebody + 2kb upstream as a proximate of the gene activity. I want to compare their methods and cicero.
Thanks very much.
Apologies for the very long delay in replying. The output gene activity scores from Cicero are normalized and so will be quantitative values from 0 to 1. For the mouse atlas project, we did a post processing step to convert values to a more 'fpkm-like' scale, which is described in the methods of that paper (https://www.cell.com/cell/fulltext/S0092-8674(18)30855-9#secsectitle0085) in the section titled 'Computing Gene Activity Scores'.
No problem. Many thanks for the clarification.
Hi,
In general, what's the scale of the matrix?
I asked the Seurat V3 author because I am using it for label transferring from scRNAseq data.
Does cicero normalize the value somehow? or should I use the un-normalized values?
Thanks, Tommy