CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
491 stars 190 forks source link

use of '--unmapped/unpaired-reads=use' read pairs; unmapped read of pair dicarded, mapped kept. #519

Closed alexander-e-f-smith closed 2 years ago

alexander-e-f-smith commented 2 years ago

UMItools dedup: It seems that when using '--unmapped/unpaired-reads=use' the unmapped read of a pair is discarded and the mapped counterpart is retained. This leads to genuine singletons/orphan reads (no paired read at all in dedup output ) when in fact it would have had a partner in input file which had an unaligned status. Id this the intended behaviour of the tool?

IanSudbery commented 2 years ago

Hi Alexander, this is expected (if not quite ideal behaviour). Unfortunately, if the mate of a read is unmapped, we have no good way of finding it. We could, I guess, hold a buffer on reads whose mates are unmapped, and then go looking at the end. I suspect this buffer might get quite larger.

@TomSmithCGAT ?

alexander-e-f-smith commented 2 years ago

Hi Ian. Thanks for the confirmation. I'll extract names of these outputted singletons for use in retrieval of their unmapped mates from the input bam.