Open seboyden opened 6 years ago
I agree that PCR free WGS has become the norm. Therefore, I think this is a good suggestion. However, it does require that samblaster parse read-ids, something that it does not do currently. I will strongly consider this feature for any upcoming major release of samblaster.
Thanks—I (and others) will appreciate it!
Any further consideration of adding optical duplicate marking?
Yes, I have been thinking about how to do this, but it is difficult in a one-pass algorithm that samblaster must use to satisfy its primary usage scenario in a pipe. In particular, I have yet to imagine a solution that does not approximately double the amount of memory used by samblaster in order to keep track of the Illumina flow cell location for reads.
Thanks, I think 2X memory usage might be acceptable given this would be optional, especially if warned about the increased memory in the documentation/help.
I've submitted a pull request of changes I Mae that would allow this. You should be able to add UMI support on top of that in just a few minutes.
I'd like to request a feature analogous to the Picard MarkDuplicates TAGGING_POLICY option, where setting All will record the Duplicate Type (PCR or optical) in the optional DT tag, and OpticalOnly will only mark optical duplicates. It's often recommended to only mark optical duplicates on data from PCR-free library prep, which includes most WGS. Thanks!