Closed omicsclass closed 2 years ago
In normal operation, UMI-tools uses two pieces of information to decide if reads are duplicates of each other: their alignment "position" and the sequence of the UMI. In --per-gene
, the alignment position is the gene to which a read is aligned, base-pair position within the gene is not taken to be of relevance as most relevant techniques fragment after amplification, duplicates from can have different base pair positions, but will always come from the same gene. This is bascially what --per-gene
does.
When you use --ignore-umi
(which is really only a debugging option), then UMI-tools uses only the position. Since for --per-gene
position is the identity of the gene, all reads aligned to the same gene are regarded as duplicates of each other (as they have the same "position", and UMI is ignore), and are thus all collapsed onto a single read, as long as the there is at least one.
Thus all genes will have either 1 or 0.
Hi, our library have barcodes, but don't have umi, can we get reads counts if using --ignore-umi and discard --per-gene?
when add the parameter --ignore-umi , why the count matrix result only 1 and 0?
umi_tools count --per-gene --gene-tag=XT --assigned-status-tag=XS --per-cell --wide-format-cell-counts -I assigned_sortedProcessed.sorted.bam -S counts.tsv.gz --ignore-umi