Open Sanat-Mishra opened 3 years ago
Hi, yes you'll need more than 20 peaks post-IDR-merging for the pipeline to run fully. If you're seeing such low overlap, then it's possible that your data isn't that reproducible. You can try using bedtools intersect on your significantly called peaks to get an estimate on how much your reps overlap; if the overlap is high, then we can debug the potential pipeline issue
Thanks, Brian. That is indeed very reassuring to know, however, I checked the BED files generated by Clipper for both my replicates against the files uploaded for each replicate on ENCODE. Both of my files have far fewer entries than the ones that have been uploaded. I wonder what might have caused this since I followed each step in the pipeline (except fastq-sort and removing repetitive regions - since I am trying to find sites within these).
Is the data on ENCODE not reproducible?
If the data is on ENCODE, then the replicates should be reproducible. However the pipeline used includes the repeat mapping and filtering, so the BED files only include uniquely mapped reads.
To look at repetitive regions, there is another pipeline, which works for GRCh38-aligned data: https://github.com/YeoLab/repetitive-element-mapping
I see. Can I email you some queries about my variant of the pipeline? Maybe they can help me debug this problem.
Thanks, Sanat
Sure, I can try to help. bay001 at health.ucsd.edu
In general, however, I would recommend following the peak calling protocol as described using the example data, and post issues in the proper repo (https://github.com/YeoLab/eclip)
Hi,
I ran into an issue while executing the pipeline on my local system -
ValueError: Peak files must contain at least 20 peaks post-merge
My data had only 7 peaks post-merge. I was wondering if this might cause the downstream pipeline to fail and exit with just one file '01v02.idr.out', as is happening in my situation.
Thanks!