Are UMI reads not sequenced more than once removed during dedup step?

CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets

MIT License

491 stars 190 forks source link

Are UMI reads not sequenced more than once removed during dedup step? #634

Closed goyastephanie closed 7 months ago

goyastephanie commented 8 months ago

Hi! I would like to confirm a detail about the UMI deduplication process: If there is a single read with a given UMI without any copy of it (due to different reasons, such as not enough sequencing depth), after the UMI dedup step, is this read kept in the deduplicated output? Or is the read filtered because no deduplication/correction steps could be performed on this UMI?

IanSudbery commented 8 months ago

UMI tool's approach is that it removes things it definately thinks are deuplicates. As such, where there is only a single read at a given coordinate, that read cannot be a duplicate, and so the read is retained.