cole-trapnell-lab / cicero-release

https://cole-trapnell-lab.github.io/cicero-release/
MIT License
55 stars 14 forks source link

How to prepare input data for atac-seq trajectory as "cicero_data"? #19

Closed x811zou closed 5 years ago

x811zou commented 5 years ago

input_cds <- make_atac_cds(cicero_data, binarize = TRUE)

I noticed that the example data "cicero_data", has three columns. Peak Cell Count 140 chr18_30209631_30210783 AGCGATAGGCGCTATGGTGGAATTCAGTCAGGACGT 4 150 chr18_45820294_45821666 AGCGATAGGTAGCAGCTATGGTAATCCTAGGCGAAG 2 185 chr18_32820116_32820994 TAATGCGCCGCTTATCGTTGGCAGCTCGGTACTGAC 2 266 chr18_41888433_41890138 AGCGATAGGCGCTATGGTGGAATTCAGTCAGGACGT 2

Just wanna know how do you prepare this input data. I know that I could obtain Peak from peaks.bed and cell from barcodes.tsv and count from count matrix. However, they have different dimensions. How do you combine them to generate this input data? I am getting confused.

Thanks!

hpliner commented 5 years ago

Sounds like you're using 10x data. Read here: https://cole-trapnell-lab.github.io/cicero-release/docs/#loading-10x-scatac-seq-data for information on loading data from 10x outs

x811zou commented 5 years ago

@hpliner I did read that part, but the input_cds from the below step is not a data frame but a dgMatrix and it does not contain the (Peak Cell Count), whereas, "cicero_data" is a data frame for atac-seq trajectory process.

Ensure there are no peaks included with zero reads

input_cds <- input_cds[Matrix::rowSums(exprs(input_cds)) != 0,]

hpliner commented 5 years ago

Most of Cicero's functions only use a CDS object like input_cds. The part of the tutorial using cicero_data is only there to show how to make your input_cds to begin with. Is there some function that you need the sparse matrix format (peak, cell, count) for? Or is there a function that's erroring out when you use the input_cds generated from your 10x output?

mairamirza commented 1 year ago

Hi, I want to load 10X scATAC-seq data ( matrix.mtx, features.tsv, and barcode.tsv) to Cicero in order to calculate co-accessibility. Is there any way to do that? I am following this link: https://cole-trapnell-lab.github.io/cicero-release/docs/#loading-10x-scatac-seq-data , but I have this errors that say the dimnames are not equal for peakinfo and indata.

Thanks!

hpliner commented 1 year ago

Hello, you should be able to load your data using the monocle3 function designed for that:

cds <- load_mm_data(mat_path = "~/Downloads/matrix.mtx", 
                    feature_anno_path = "~/Downloads/features.tsv", 
                    cell_anno_path = "~/Downloads/barcodes.tsv")

Just be sure you're also using the monocle3 version of cicero (install instructions here: https://cole-trapnell-lab.github.io/cicero-release/docs_m3/#installing-cicero)