Closed amdreamer closed 2 years ago
Please use
The provided script in the repo (https://github.com/YeoLab/eclip/blob/master/bin/barcodecollapsepe.py) for paired-end eCLIP as described with this method (https://www.nature.com/articles/nmeth.3810)
or
umi_tools (https://github.com/CGATOxford/UMI-tools) for single-end eCLIP as described with this method (https://pubmed.ncbi.nlm.nih.gov/28766298/).
For the barcodecollapsepe.py script, the reads need to be namesorted.
eCLIP protocols describe the use of unique molecular identifiers (randomer) barcode tags for which these two scripts utilize to dedup.
Hi,
When processing eCLIP-seq data, we found different results after removing PCR duplicates using different tools. One tool is barcode_collapse_pe.py(https://github.com/YeoLab/gscripts/blob/master/gscripts/clipseq/barcode_collapse_pe.py). It removed ~66% reads, and only 1/3 were kept after removing PCR duplicates. The other tool is picard, which removed ~20% reads, kept far more reads than barcode_collapse_pe.py did. Since barcode_collapse_pe.py applied a stricter cut-off than picard, I wonder which tool is more suitable in this senario?
Thanks for your time~ Best wishes