YeoLab / clipper

A tool to identify CLIP-seq peaks
Other
64 stars 41 forks source link

Remove PCR duplicates using barcode_collapse_pe.py or picard? #100

Closed amdreamer closed 2 years ago

amdreamer commented 3 years ago

Hi,

When processing eCLIP-seq data, we found different results after removing PCR duplicates using different tools. One tool is barcode_collapse_pe.py(https://github.com/YeoLab/gscripts/blob/master/gscripts/clipseq/barcode_collapse_pe.py). It removed ~66% reads, and only 1/3 were kept after removing PCR duplicates. The other tool is picard, which removed ~20% reads, kept far more reads than barcode_collapse_pe.py did. Since barcode_collapse_pe.py applied a stricter cut-off than picard, I wonder which tool is more suitable in this senario?

Thanks for your time~

byee4 commented 2 years ago

Hi,

Please use the script in the eclip repo for eCLIP (paired end version): https://github.com/YeoLab/eclip/blob/master/bin/barcodecollapsepe.py

or umi_tools (umi_tools dedup) as described in the SOP for eCLIP (single end version): https://github.com/YeoLab/eclip/blob/master/documentation/eCLIP_analysisSOP_v2.2.docx