Remove PCR duplicates using barcode_collapse_pe.py or picard? - Githubissues

YeoLab / eCLIP

Other

38 stars 26 forks source link

Remove PCR duplicates using barcode_collapse_pe.py or picard? #28

Closed amdreamer closed 2 years ago

amdreamer commented 3 years ago

Hi,

When processing eCLIP-seq data, we found different results after removing PCR duplicates using different tools. One tool is barcode_collapse_pe.py(https://github.com/YeoLab/gscripts/blob/master/gscripts/clipseq/barcode_collapse_pe.py). It removed ~66% reads, and only 1/3 were kept after removing PCR duplicates. The other tool is picard, which removed ~20% reads, kept far more reads than barcode_collapse_pe.py did. Since barcode_collapse_pe.py applied a stricter cut-off than picard, I wonder which tool is more suitable in this senario?

Thanks for your time~ Best wishes

byee4 commented 2 years ago

Please use

The provided script in the repo (https://github.com/YeoLab/eclip/blob/master/bin/barcodecollapsepe.py) for paired-end eCLIP as described with this method (https://www.nature.com/articles/nmeth.3810)

or

umi_tools (https://github.com/CGATOxford/UMI-tools) for single-end eCLIP as described with this method (https://pubmed.ncbi.nlm.nih.gov/28766298/).

For the barcodecollapsepe.py script, the reads need to be namesorted.

eCLIP protocols describe the use of unique molecular identifiers (randomer) barcode tags for which these two scripts utilize to dedup.