databio / peppro

A modular, containerized pipeline for PRO-seq data processing
http://peppro.databio.org/
BSD 2-Clause "Simplified" License
10 stars 2 forks source link

subprocess.CalledProcessError: returned non-zero exit status 2 #28

Closed nsheff closed 5 years ago

nsheff commented 5 years ago
> `samtools view -q 10 -b -@ 8 -U /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/aligned_hg38/GSM3618128_fail_qc.bam /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/aligned_hg38/GSM3618128_temp.bam > /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/aligned_hg38/GSM3618128_sort.bam` (12505)
<pre>
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.019GB.  
  PID: 12505;   Command: samtools;  Return code: 0; Memory used: 0.019GB

> `samtools depth -b /project/shefflab/genomes/hg38/refgene_anno/hg38_pre-mRNA.bed /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/aligned_hg38/GSM3618128_sort.bam | awk '{counter++;sum+=$3}END{print sum/counter}'`
awk: cmd. line:1: fatal: division by zero attempted
Starting cleanup: 4 files; 4 conditional files for cleanup

Cleaning up flagged intermediate files. . .

Conditional flag found: []

These conditional files were left in place:

- /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/GSM3618128*.fastq
- /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/*.fq
- /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/*.fastq
- /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/*.log
- /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/GSM3618128_R1_processed.fastq.single.fq
- /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/GSM3618128_R2_trimmed.fastq.single.fq
- /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/repaired.flag
- /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/aligned_hg38/tmp5NTjAP

### Pipeline failed at:  (08-16 10:56:12) elapsed: 44.0 _TIME_

Total time: 0:00:45
Failure reason: Command 'samtools depth -b /project/shefflab/genomes/hg38/refgene_anno/hg38_pre-mRNA.bed /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/aligned_hg38/GSM3618128_sort.bam | awk '{counter++;sum+=$3}END{print sum/counter}'' returned non-zero exit status 2
Traceback (most recent call last):
  File "/home/nsheff/code/peppro/pipelines/peppro.py", line 2972, in <module>
    sys.exit(main())
  File "/home/nsheff/code/peppro/pipelines/peppro.py", line 1903, in main
    container=pm.container)
  File "/home/nsheff/.local/lib/python2.7/site-packages/pypiper/manager.py", line 796, in run
    call_follow()
  File "/home/nsheff/.local/lib/python2.7/site-packages/pypiper/manager.py", line 673, in call_follow
    follow()
  File "/home/nsheff/code/peppro/pipelines/peppro.py", line 1902, in <lambda>
    mapping_genome_bam),
  File "/home/nsheff/code/peppro/pipelines/peppro.py", line 1884, in check_alignment_genome
    rd = pm.checkprint(cmd)
  File "/home/nsheff/.local/lib/python2.7/site-packages/pypiper/manager.py", line 840, in checkprint
    self._triage_error(e, nofail)
  File "/home/nsheff/.local/lib/python2.7/site-packages/pypiper/manager.py", line 2128, in _triage_error
    self.fail_pipeline(e)
  File "/home/nsheff/.local/lib/python2.7/site-packages/pypiper/manager.py", line 1657, in fail_pipeline
    raise e
subprocess.CalledProcessError: Command 'samtools depth -b /project/shefflab/genomes/hg38/refgene_anno/hg38_pre-mRNA.bed /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/aligned_hg38/GSM3618128_sort.bam | awk '{counter++;sum+=$3}END{print sum/counter}'' returned non-zero exit status 2
nsheff commented 5 years ago

this command:

> `(fastp --overrepresentation_analysis --thread 8 \
--in1 /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/GSM3618128_R1.fastq --adapter_sequence TGGAATTCTCGGGTGCCAAGG \
--length_required 18 --html /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastqc/GSM3618128_R1_rmAdapter.html \
--json /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastqc/GSM3618128_R1_rmAdapter.json --report_title 'GSM3618128' \
-o /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/GSM3618128_R1_noadap.fastq )\
 2> /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastqc/GSM3618128_R1_rmAdapter.txt | seqtk trimfq -b 0 -L 30 - | seqtk seq -r - \
> /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/GSM3618128_R1_processed.fastq` (13181,13183,13190)
<pre>
</pre>
Command completed. Elapsed time: 0:01:48. Running peak memory: 0.019GB.  
  PID: 13181;   Command: fastp; Return code: 0; Memory used: 0.019GB  
  PID: 13190;   Command: seqtk; Return code: 0; Memory used: 0.019GB  
  PID: 13183;   Command: seqtk; Return code: 0; Memory used: 0.019GB

appears to be creating an empty file in GSM3618128_R1_processed.fastq

nsheff commented 5 years ago

This large fastp command is direction stdout into the GSM3618128_R1_processed.fastq file, but I can confirm that the command does not actually have any stdout...

It appears instead to be using the -o flag to direct the "read1 ougtput file" into the R1_noadap.fastq file:

(fastp --overrepresentation_analysis --thread 8 --in1 /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/GSM3618128_R1.fastq --adapter_sequence TGGAATTCTCGGGTGCCAAGG --length_required 18 --html /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastqc/GSM3618128_R1_rmAdapter.html --json /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastqc/GSM3618128_R1_rmAdapter.json --report_title 'GSM3618128' -o /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/GSM3618128_R1_noadap.fastq )

Read1 before filtering: total reads: 47446064 total bases: 2080338564 Q20 bases: 2025600908(97.3688%) Q30 bases: 2008391701(96.5416%)

Read1 after filtering: total reads: 21982678 total bases: 930521629 Q20 bases: 907046935(97.4773%) Q30 bases: 898275329(96.5346%)

Filtering result: reads passed filter: 21982678 reads failed due to low quality: 243070 reads failed due to too many N: 178 reads failed due to too short: 25220138 reads with adapter trimmed: 29836086 bases trimmed due to adapters: 945959657

Duplication rate (may be overestimated since this is SE data): 11.8692%

JSON report: /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastqc/GSM3618128_R1_rmAdapter.json HTML report: /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastqc/GSM3618128_R1_rmAdapter.html

fastp --overrepresentation_analysis --thread 8 --in1 /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/GSM3618128_R1.fastq --adapter_sequence TGGAATTCTCGGGTGCCAAGG --length_required 18 --html /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastqc/GSM3618128_R1_rmAdapter.html --json /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastqc/GSM3618128_R1_rmAdapter.json --report_title GSM3618128 -o /ext/yeti/processed/ppqc_test/results_pipeline/GSM3618128/fastq/GSM3618128_R1_noadap.fastq fastp v0.20.0, time used: 95 seconds

nsheff commented 5 years ago

I notice that on some lines, processed_fastq is the target of a redirect:

https://github.com/databio/peppro/blob/644ca073809813e1474793601a9b2df3acd4f74c/pipelines/peppro.py#L425-L437

other times it's a target of -o:

https://github.com/databio/peppro/blob/644ca073809813e1474793601a9b2df3acd4f74c/pipelines/peppro.py#L500-L518

also it only shows up as a variable on single-end; https://github.com/databio/peppro/blob/644ca073809813e1474793601a9b2df3acd4f74c/pipelines/peppro.py#L167-L194

could the error be that it's supposed to be a target of -o, which is actually writing output in these commands?