YeoLab / eCLIP

Other
38 stars 26 forks source link

Error with input normalisation #25

Closed matthew-valentine closed 3 years ago

matthew-valentine commented 3 years ago

I am running the input normalisation using overlap_peakfi_with_bam_PE.pl. It completes with no errors, and I get the output files, but every peak has a count of 1. I was playing around with the script, and it seems that it is unable to execute the for loop starting line 332 in the subfunction read_bamfi. Any print statements I put inside it don't print.

Nothing appears to be written to %peak_read_counts in the loop at line 349, so the for loop on line 94 sets the counts for each peak to 1, and then whether or not a peak is enriched or not is solely dependent on whether the input or eCLIP file has more reads. I'm guessing this is just a problem with my input data files, but I'm not sure what it is. Do you have any pointers/things I should look out for?

Matthew

byee4 commented 3 years ago

Hi,

I don't see an obvious problem, can you let me know what your commandline looks like, and what input data files you're using (where/how they were created or if you used the eCLIP pipeline)?

matthew-valentine commented 3 years ago

Thanks for getting back to me! My command line is just very standard:

overlap_peakfi_with_bam_PE.pl eclip.bam input.bam clipper_peaks.bed \ eclip_mapped_readnum.txt input_mapped_readnum.txt clipper_peaks_norm.bed

I have uploaded some smaller test files here that contain peaks and reads related to a region of interest (made for much quicker testing). The files they originate from were made generally following the standard eCLIP pipeline.

https://riken-share.box.com/s/su0pk4bh3y0yd4bio3zqrr6k6njwvo6h

matthew-valentine commented 3 years ago

I actually realised what the problem was. We have been using a modified eCLIP pipeline, and as a result all of our data is on the opposite strand relative to the standard data processing pipeline. When I was running clipper the peaks outputted with the actual strand of the gene, so all the peaks are "+" strand, while the reads themselves are all "-" strand. That is why it wasn't finding any reads for the peaks and just outputting 1.

byee4 commented 3 years ago

Ah gotcha. So the issue is resolved if the strands are flipped?

matthew-valentine commented 3 years ago

Indeed, if I just switch the strand in the peak file it works perfectly!