Closed alexander-e-f-smith closed 6 months ago
Thanks for your help. Following on from previous questions: When using '--unmapped/unpaired-reads=use', can you confirm how the singleton reads are deduplicated when encountered please. Are these singletons assessed for duplication (grouped) against all reads or just other singleton reads (using which ever the mapped read of a pair is) - if the former, would there be cases where a there is a mixture of singletons and proper pairs in a single UMI/duplicate group of which either could be selected based on (default) mapping quality? On a related matter, is there a recommended running procedure when dealing/requiring unmapped/unpaired reads...eg selection of something other than --directional grouping method? This would in part be to counter performance issues
Under normal circumstances, UMI-tools uses the read1 pos and template fields to group reads to be considered for UMI clustering. This continues to be true when --unmapped/unpaired-reads=use is set. What this means is that singleton read1s will have their position recorded as (pos, ""), and thus will only be clustered with other reads that have their position information as (pos, "").
On a related matter, is there a recommended running procedure when dealing/requiring unmapped/unpaired reads...eg selection of something other than --directional grouping method?
My personal instinct is to filter out unmapped/unpaired reads unless there is a specific reason to keep them. Unfortunately I don't think their are any parameters that can be tweaked that would improve performance.
Closing due to inactivity
UMItools dedup: It seems that when using '--unmapped/unpaired-reads=use' the unmapped read of a pair is discarded and the mapped counterpart is retained. This leads to genuine singletons/orphan reads (no paired read at all in dedup output ) when in fact it would have had a partner in input file which had an unaligned status. Id this the intended behaviour of the tool?
Originally posted by @alexander-e-f-smith in https://github.com/CGATOxford/UMI-tools/issues/519#issuecomment-1060982986