CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
493 stars 190 forks source link

umi_tools dedup with gene-tag on, how are those reads with "GN:Z:-" tag deduplicated? #614

Closed MengjunWu closed 12 months ago

MengjunWu commented 1 year ago

Hi,

I am using umi_tools to deduplicate 10x scRNA-seq bam file with paramters: --cell-tag=CB --umi-tag=UB --per-cell --per-gene --gene-tag=GN

I noticed in the output deduplicated bam files, there are still a lot of reads with "GN:Z:-". In this case, how are reads that are not overlapping with the transcriptome, i.e. those with "GN:Z:-" tag, are collapsed?

Many thanks, Mengjun

IanSudbery commented 12 months ago

Sorry for the delay replying. These reads would all be assigned to a gene called "-". Any two reads with the same UMI assigned to "-" would be collapsed.

MengjunWu commented 12 months ago

Ah, thats a new gene! Many thanks for the clarification :)