Closed davidaknowles closed 1 year ago
Thanks Avi. @rob-p enjoy RECOMB, no rush.
Hi @davidaknowles,
Sorry for taking so long to get back here. @k3yavi's interpretation is correct. What is written in the MatrixMarket file for each triplet is the row id (i.e. cell barcode), column id (equivalence class id), and UMI count.
The equivalence class id is the unique, distinct id associated with each pattern of gene occurrences (or gene + splicing-status occurrences). For each equivalence class label in gene_eqclass.txt.gz
, the equivalence class id is the last number of the line.
Best, Rob
Perfect, thank you!
I'm trying to understand
gene_eqclass.txt.gz
andgeqc_counts.mtx
. I mostly get it from skimmingquant.rs
and the comments therein:gene_eqclass.txt.gz
has a line giving the number of genes (or the number of USA targets), then a line giving the number of ECs. Then there is one line per EC, where the first n-1 entries are gene IDs, and the last entry is the EC idx (which is not the ordering in the file).Then
geqc_counts.mtx
is cells x ECs, presumably with the row labels (cell barcodes being given byquants_mat_rows.txt
. But what is the indexing for the columns, i.e. the ECs? Is that the EC idx (the last entry of each line ingene_eqclass.txt.gz
) or the line number (-2) fromgene_eqclass.txt.gz
?Thanks!