CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
481 stars 190 forks source link

How is mapping coordinates for UMI group defined? #492

Closed YichaoOU closed 2 years ago

YichaoOU commented 2 years ago

Hello,

For single-end data, same mapping coordinates means same read start, regardless of read end?

For paired-end data, same mapping coordinates means what?

Thanks, Yichao

IanSudbery commented 2 years ago

For single end data, "same mapping coordinate" is almost "same read start", but we also account for softclipping. So if a read has the start coordinate of "100", but two bases are softclipped from the 5' end, the coordinate we use is "98".

For paired end, "same mapping coordinate" means the two reads have both the same start coordinate, as definted above, but also the same insert length (which we use as a proxy for the postion of read two.

YichaoOU commented 2 years ago

Thank you so much for the quick reply!