Hoohm / CITE-seq-Count

A tool that allows to get UMI counts from a single cell protein assay
https://hoohm.github.io/CITE-seq-Count/
MIT License
79 stars 44 forks source link

barcodes with 0s in all protein tags and unmapped roll #180

Open Hanxi-002 opened 1 year ago

Hanxi-002 commented 1 year ago

Hi,

I have cite-seq data with only one protein. After running cite-seq-count and use Read10x() to read the umi_count matrices, I noticed that my matrix is very sparse. Since I only have 1 protein, I only have 2 rows in my matrix (protein, unmapped). My questions is what does it mean when barcodes (columns) have 0s in both rows? I would assume that if no protein of interest is mapped, it would be categorized in the unmapped row. But that is obvious a false assumption.

Thank you so much for your time!

cpflueger2016 commented 1 year ago

Hey Hanxi, unmapped is a DNA barcode from the CITE-seq data that does not match the antibody barcode that labels your protein. This can be due to PCR issues (most likely), sequencing errors, or incorrect demultiplexing (bcl2fastq), etc.

From my experience, it is quite common to see unmapped reads and you’ll just filter them out since you cannot extract info out of them other than the fact that you might have had too many PCR cycles in your library prep.

Hoohm commented 1 year ago

Other potential issue is that you provided a list of cells to extract. The output will give you a full matrix having those cell barcodes and if nothing with this barcode was found in R1, then you get an empty count.