YeoLab / gscripts

General Use Scripts and Helper functions
MIT License
17 stars 18 forks source link

Running "Peak_input_normalization_wrapper.pl" takes a long time #95

Open amdreamer opened 3 years ago

amdreamer commented 3 years ago

After calling peaks using clipper, I got ~180,000 peaks in eCLIP group and ~65,000 peaks in SMInput group. Then I used Peak_input_normalization_wrapper.pl to get normalized peaks. However, the program is running over 48 hours without any error or warnning. Is it a time-consuming scripts? Can I speed it up? Thanks for your opinion!

byee4 commented 3 years ago

You'll want to refer to the eCLIP repository for current/future analysis:

https://github.com/YeoLab/eclip

The wrapper basically runs the following two scripts to provide input-normalized peaks. If you're having trouble with the wrapper, I would suggest running these one after the other:

https://github.com/YeoLab/eclip/blob/master/bin/overlap_peakfi_with_bam.pl

Usage:

samtools view -cF 4 CLIP_pcr_deduped.bam > CLIP_mapped_readnum.txt
samtools view -cF 4 SMINPUT_pcr_deduped.bam > SMINPUT_mapped_readnum.txt

perl overlap_peakfi_with_bam.pl \
CLIP_pcr_deduped.bam \
SMINPUT_pcr_deduped.bam \
CLIP_clipper_clusters.bed \
CLIP_mapped_readnum.txt \
SMINPUT_mapped_readnum.txt \
output_normalized_peaks.bed

https://github.com/YeoLab/eclip/blob/master/bin/compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl

perl compress_l2foldenrpeakfi_for_replicate_overlapping_bedformat.pl \
output_normalized_peaks.bed
output_normalized_peaks.compressed.bed