Using the HiSeqX sequencer we occasionally observe a peak in duplicate rates that can be attributed to fragments being amplified across adjacent wells on the flowcell. To assess the rate at which this happens we use the Picard MarkDuplicates command while supplying the following extra parameters to allow Picard to parse our read names...
This allows us to get a report in the metrics file from which we can calculate the fraction of duplicates that are adjacent on the flowcell.
Our current workflow is to mark duplicates with sambamba, but when we suspect a peak in "proximal" duplicates we have to return to Picard to get the estimate.
Hi all,
Using the HiSeqX sequencer we occasionally observe a peak in duplicate rates that can be attributed to fragments being amplified across adjacent wells on the flowcell. To assess the rate at which this happens we use the Picard MarkDuplicates command while supplying the following extra parameters to allow Picard to parse our read names...
...OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500 READ_NAMEREGEX="[a-zA-Z0-9]+:[0-9]+:([0-9]+):([0-9]+):([0-9]+).*"
This allows us to get a report in the metrics file from which we can calculate the fraction of duplicates that are adjacent on the flowcell.
Our current workflow is to mark duplicates with sambamba, but when we suspect a peak in "proximal" duplicates we have to return to Picard to get the estimate.
thanks, RIchard