ENCODE-DCC / chip-seq-pipeline2

ENCODE ChIP-seq pipeline
MIT License
241 stars 123 forks source link

call-idr-pr failed #116

Closed yifeisun03 closed 4 years ago

yifeisun03 commented 4 years ago

Hello:

The pipeline failed with one line"Workflow 28f1a662-12b1-496e-b971-a71fe591901c transitioned to state Failed"in the .err file, but when I open the .out file and check for the detail, something looks wired.

In the .out file, I see "Successfully completed", but when I scroll down I see "chip.idr_pr Failed." on the bottom. This seems to be the only error in the workflow since "[Caper] run: 1 28f1a662-12b1-496e-b971-a71fe591901c bsub/chip-caper/chip/28f1a662-12b1-496e-b971-a71fe591901c/metadata.json" only found 1 error.

My other jobs finished ok with qc_report for the last step, but since spp takes too long, and sometimes overrun the wall time(144h), I add "chip.peak_caller" : "macs2" in my input json file, and this is the only difference from other finished jobs.

In my output directory, I only see call-idr-pr and call-overlap-pr(finished fine), but lacking _ppr for both and call-idr and call-overlap compared to other fully finished jobs. In two independent jobs with "mac2" callers, I got the same error at the call-idr-pr step, and only in "shard1", while the parallel "shard0" finished ok, so I'm really not sure what's the problem here.


call-idr_pr stderr, the error messages are as below(basically the same as caper debug): "Traceback (most recent call last): File "/hpc/users/suny04/.conda/envs/encode-chip-seq-pipeline/bin/encode_task_idr.py", line 180, in main() File "/hpc/users/suny04/.conda/envs/encode-chip-seq-pipeline/bin/encode_task_idr.py", line 152, in main idr_peak, args.blacklist, args.regex_bfilt_peak_chr_name, args.out_dir) File "/hpc/users/suny04/.conda/envs/encode-chip-seq-pipeline/bin/encode_lib_blacklist_filter.py", line 54, in blacklist_filter run_shell_cmd(cmd) File "/hpc/users/suny04/.conda/envs/encode-chip-seq-pipeline/bin/encode_lib_common.py", line 319, in run_shell_cmd raise Exception(err_str) Exception: PID=369350, PGID=369350, RC=1 STDERR= STDOUT="

Thank for your help!

leepc12 commented 4 years ago

If it fails at IDR step, then IDR threshold can be too stringent so that the final peak file is empty. Check if any *.bfilt.narrowPeak.gz is empty (20 bytes in gz format).

yifeisun03 commented 4 years ago

Thanks, indeed, some of the replicates has empty *.bfilt.narrowPeak.gz.

I'm not sure if this is due to using MACS2 other than spp( with SPP, even though I get low number of peaks and may fail for idr QC, but there's no empty file and pipeline can finish normally).

I'll adjust idr threshold to 0.1 and see if that helps the pipeline to run through.