Open jiangpuxuan opened 1 year ago
Hi, first I'd like to warn that I've used this script to process the SHARE-seq data, it may need some adaptations for other datasets. For instance, I'm removing the past 3 digits from the barcodes to match RNA-seq and ATAC-seq barcodes, this probably doesn't work with other datasets. Another thing is that we used binary data (expression/atac-seq coded as 0 or 1) To answer your questions:
It is going very well with the
sc_cop
package. When I came to useShareSeqCoex.py
to analyze my scATAC and scRNA data, I met some problems about the format ofgene expression matrix
,gencode_v19.bed
andpeak_matrix.tsv
.Gene expression matrix
Does
R1.02,R2.10,R3.20,P1.51
mean barcode of one single cell ? Does thevalue
mean the expression of each gene for every cell ?genecode_v19.bed
For
gencode_v19.bed
(read_gene_models):The Example format does not seem to have
"cell_name" and "donor" columns
peak_matrix.tsv
My
fragments.tsv
goes like this:but the example like this:
What does the column 4~6 mean? How could I reshape my data?
Thank you for your help!