YeoLab / merge_peaks

Pipeline for using IDR to produce a set of peaks given two replicate eCLIP peaks
9 stars 7 forks source link

Minimum number of peaks in output file #11

Open Sanat-Mishra opened 3 years ago

Sanat-Mishra commented 3 years ago

Hi,

I ran into an issue while executing the pipeline on my local system -

ValueError: Peak files must contain at least 20 peaks post-merge

My data had only 7 peaks post-merge. I was wondering if this might cause the downstream pipeline to fail and exit with just one file '01v02.idr.out', as is happening in my situation.

Thanks!

byee4 commented 3 years ago

Hi, yes you'll need more than 20 peaks post-IDR-merging for the pipeline to run fully. If you're seeing such low overlap, then it's possible that your data isn't that reproducible. You can try using bedtools intersect on your significantly called peaks to get an estimate on how much your reps overlap; if the overlap is high, then we can debug the potential pipeline issue

Sanat-Mishra commented 3 years ago

Thanks, Brian. That is indeed very reassuring to know, however, I checked the BED files generated by Clipper for both my replicates against the files uploaded for each replicate on ENCODE. Both of my files have far fewer entries than the ones that have been uploaded. I wonder what might have caused this since I followed each step in the pipeline (except fastq-sort and removing repetitive regions - since I am trying to find sites within these).

Is the data on ENCODE not reproducible?

byee4 commented 3 years ago

If the data is on ENCODE, then the replicates should be reproducible. However the pipeline used includes the repeat mapping and filtering, so the BED files only include uniquely mapped reads.

To look at repetitive regions, there is another pipeline, which works for GRCh38-aligned data: https://github.com/YeoLab/repetitive-element-mapping

Sanat-Mishra commented 3 years ago

I see. Can I email you some queries about my variant of the pipeline? Maybe they can help me debug this problem.

Thanks, Sanat

byee4 commented 3 years ago

Sure, I can try to help. bay001 at health.ucsd.edu

In general, however, I would recommend following the peak calling protocol as described using the example data, and post issues in the proper repo (https://github.com/YeoLab/eclip)