Closed alexander-e-f-smith closed 1 year ago
@TomSmithCGAT Do you remember why we don't allow the output of all unmapped reads in dedup?
It's essentially a design choice on our part. dedup
is explicitly to designed to return a dedupicated BAM, which requires them to be mapped, so there's no scope to return unmapped reads. group
is designed to group reads by their UMI, but we've allowed a slightly wider scope so that ungrouped reads can also be returned.
From the bundle_iterator
, group can handle reads which are not 'grouped':
Whereas,dedup
cannot:
https://github.com/CGATOxford/UMI-tools/blob/5c2dd0fd208df3a8f93399c99b1d164aef8094be/umi_tools/dedup.py#L309-L311
@alexander-e-f-smith, if you do want to retain a mixture of deduplicated mapped reads and undeduplicated unmapped reads, your approach of extracting these and then adding back post-deduplication seems the best route.
If you need a very bespoke deduplication process, you might need to implement this yourself. You can make use of the UMI-tools API to use the same underlying deduplication algorithm (https://umi-tools.readthedocs.io/en/latest/API.html)
Oh year, it didn't cross my mind to just add back the unmapped reads at the end.
Thanks both Can you confirm that umidedup will attempt to dedup the one read of a pair that is aligned (and throw the unaligned mate away). So it makes sense to recover unmapped mate after dedup (via outputted singleton read names). However, will dedup module only work on such read pairs if READ1 specifically is aligned (and read2 mate unaligned).
That is correct, yes.
Hi Is it possible to output unaligned reads in umitools dedup module and not just group module. I'm trying to maximize use of non-aligned reads (star) for fusion calling, through UMI deduplication. My best solution so far is to extract unaligned reads prior to UMI dedup and then merge back into data afterwards I have tried the '--unmapped-reads=use' and 'unpaired-reads =use'' but these seem to: A.cause the memory use ballooning; B. don't actually make use of many relevant partially unaligned/improperly aligned reads as seen in downstream data supporting fusion reads, as compared to 'discarding' these reads through that dedup module option. It would be nice to output the unaligned reads that dedup module hasn't successfully used (when running '--unmapped-reads=use' for example), rather than not bothering to try and de-dup these reads at all etc. Best A