bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Truncated file; non-zero exit status; bwa-mem alignment failure #1651

Closed PatrickJReed closed 7 years ago

PatrickJReed commented 7 years ago

I've seen similar issues to this, but they're closed. I did update to current development version just before rerunning this and i get the same error. I will say upfront that it is "possible" that the file may be truncated somehow, lots of movement before analysis, local server -> S3 -> EBS -> bcbio prep merged.

[2016-11-22T15:00Z] System YAML configuration: /usr/local/share/bcbio/galaxy/bcbio_system.yaml [2016-11-22T15:00Z] Resource requests: bwa, sambamba, samtools; memory: 3.00, 3.00, 3.00; cores: 16, 16, 16 [2016-11-22T15:00Z] Configuring 1 jobs to run, using 16 cores each with 48.1g of memory reserved for each job [2016-11-22T15:00Z] Timing: organize samples [2016-11-22T15:00Z] multiprocessing: organize_samples [2016-11-22T15:00Z] Using input YAML configuration: /data/SZ_WGS_Meta-merged/config/SZ_WGS_Meta-merged.yaml [2016-11-22T15:00Z] Checking sample YAML configuration: /data/SZ_WGS_Meta-merged/config/SZ_WGS_Meta-merged.yaml [2016-11-22T15:00Z] Testing minimum versions of installed programs [2016-11-22T15:00Z] Timing: alignment preparation [2016-11-22T15:00Z] multiprocessing: prep_align_inputs [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] multiprocessing: disambiguate_split [2016-11-22T15:00Z] Timing: alignment [2016-11-22T15:00Z] multiprocessing: process_alignment [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] bwa mem alignment from fastq: 1_NT_Blood [2016-11-22T15:00Z] samblaster: Version 0.1.23 [2016-11-22T15:00Z] samblaster: Inputting from stdin [2016-11-22T15:00Z] samblaster: Outputting to stdout [2016-11-22T15:00Z] samblaster: Opening /dev/fd/62 for write. [2016-11-22T15:00Z] samblaster: Opening /dev/fd/63 for write. [2016-11-22T15:01Z] [M::mem_pestat] analyzing insert size distribution for orientation FF... [2016-11-22T15:01Z] [M::mem_pestat] (25, 50, 75) percentile: (444, 614, 2476) [2016-11-22T15:01Z] [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 6540) [2016-11-22T15:01Z] [M::mem_pestat] mean and std.dev: (1483.12, 1394.07) [2016-11-22T15:01Z] [M::mem_pestat] low and high boundaries for proper pairs: (1, 8572) [2016-11-22T15:01Z] [M::mem_pestat] analyzing insert size distribution for orientation FR... [2016-11-22T15:01Z] [M::mem_pestat] (25, 50, 75) percentile: (510, 594, 690) [2016-11-22T15:01Z] [M::mem_pestat] low and high boundaries for computing mean and std.dev: (150, 1050)

........

[2016-11-22T15:08Z] [M::mem_pestat] mean and std.dev: (2431.38, 1578.82) [2016-11-22T15:08Z] [M::mem_pestat] low and high boundaries for proper pairs: (1, 9898) [2016-11-22T15:08Z] [M::mem_pestat] skip orientation FF [2016-11-22T15:08Z] [M::mem_pestat] skip orientation RF [2016-11-22T15:08Z] [M::mem_pestat] skip orientation RR [2016-11-22T15:08Z] [fputs] Broken pipe [2016-11-22T15:08Z] [W::sam_read1] parse error at line 97572 [2016-11-22T15:08Z] [bam_sort_core] truncated file. Aborting. [2016-11-22T15:08Z] [E::sam_parse1] SEQ and QUAL are of different length [2016-11-22T15:08Z] [W::sam_read1] parse error at line 9942568 [2016-11-22T15:08Z] [main_samview] truncated file. [2016-11-22T15:09Z] Uncaught exception occurred Traceback (most recent call last): File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /usr/local/share/bcbio/galaxy/../anaconda/bin/bwa mem -c 250 -M -t 16 -R '@RG\tID:1_NT_Blood\tPL:illumina\tPU:1_2016-11-21_SZ_WGS_Meta-merged\tSM:1_NT_Blood' -v 1 /usr/local/share/bcbio/genomes/Hsapiens/GRCh37/bwa/GRCh37.fa <(grabix grab /data/SZ_WGS_Meta-merged/work/align_prep/1_NT_Blood_R1.fastq.gz 180000001 200000000) <(grabix grab /data/SZ_WGS_Meta-merged/work/align_prep/1_NT_Blood_R2.fastq.gz 180000001 200000000) | /usr/local/share/bcbio/galaxy/../anaconda/bin/samblaster --addMateTags -M --splitterFile >(/usr/local/share/bcbio/galaxy/../anaconda/bin/samtools sort -@ 16 -m 1G -T /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-spl -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpW3_m6g/1_NT_Blood-sort-180000001_200000000-sr.bam /dev/stdin) --discordantFile >(/usr/local/share/bcbio/galaxy/../anaconda/bin/samtools sort -@ 16 -m 1G -T /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-disc -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmp31ole9/1_NT_Blood-sort-180000001_200000000-disc.bam /dev/stdin) | /usr/local/share/bcbio/galaxy/../anaconda/bin/samtools view -b -S -u - | /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba sort -N -t 16 -m 1G --tmpdir /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-full -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000.bam /dev/stdin [M::mem_pestat] skip orientation RF [M::mem_pestat] skip orientation RR [M::mem_pestat] analyzing insert size distribution for orientation FF... [M::mem_pestat] (25, 50, 75) percentile: (411, 586, 1713) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 4317) [M::mem_pestat] mean and std.dev: (1136.36, 1187.83) [M::mem_pestat] low and high boundaries for proper pairs: (1, 5888) [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (510, 594, 690) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (150, 1050) [M::mem_pestat] mean and std.dev: (602.11, 140.91)

......

[M::mem_pestat] low and high boundaries for proper pairs: (1, 9898) [M::mem_pestat] skip orientation FF [M::mem_pestat] skip orientation RF [M::mem_pestat] skip orientation RR [fputs] Broken pipe [W::sam_read1] parse error at line 97572 [bam_sort_core] truncated file. Aborting. [E::sam_parse1] SEQ and QUAL are of different length [W::sam_read1] parse error at line 9942568 [main_samview] truncated file. ' returned non-zero exit status 1 Traceback (most recent call last): File "/usr/local/bin/bcbio_nextgen.py", line 230, in main(kwargs) File "/usr/local/bin/bcbio_nextgen.py", line 43, in main run_main(kwargs) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 43, in run_main fc_dir, run_info_yaml) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel for xs in pipeline(config, run_info_yaml, parallel, dirs, samples): File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 127, in variant2pipeline samples = run_parallel("process_alignment", samples) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items): File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 800, in call while self.dispatch_one_batch(iterator): File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 658, in dispatch_one_batch self._dispatch(tasks) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 566, in _dispatch job = ImmediateComputeBatch(batch) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 180, in init self.results = batch() File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 72, in call return [func(*args, *kwargs) for func, args, kwargs in self.items] File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper return apply(f, args, *kwargs) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 80, in process_alignment return sample.process_alignment(args) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/sample.py", line 113, in process_alignment data = align_to_sort_bam(fastq1, fastq2, aligner, data) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/alignment.py", line 63, in align_to_sort_bam names, align_dir, data) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/alignment.py", line 116, in _align_from_fastq out = align_fn(fastq1, fastq2, align_ref, names, align_dir, data) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/bwa.py", line 146, in align_pipe names, rg_info, data) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/bwa.py", line 156, in _align_mem [do.file_nonempty(tx_out_file), do.file_reasonable_size(tx_out_file, fastq_file)]) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) subprocess.CalledProcessError: Command 'set -o pipefail; /usr/local/share/bcbio/galaxy/../anaconda/bin/bwa mem -c 250 -M -t 16 -R '@RG\tID:1_NT_Blood\tPL:illumina\tPU:1_2016-11-21_SZ_WGS_Meta-merged\tSM:1_NT_Blood' -v 1 /usr/local/share/bcbio/genomes/Hsapiens/GRCh37/bwa/GRCh37.fa <(grabix grab /data/SZ_WGS_Meta-merged/work/align_prep/1_NT_Blood_R1.fastq.gz 180000001 200000000) <(grabix grab /data/SZ_WGS_Meta-merged/work/align_prep/1_NT_Blood_R2.fastq.gz 180000001 200000000) | /usr/local/share/bcbio/galaxy/../anaconda/bin/samblaster --addMateTags -M --splitterFile >(/usr/local/share/bcbio/galaxy/../anaconda/bin/samtools sort -@ 16 -m 1G -T /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-spl -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpW3_m6g/1_NT_Blood-sort-180000001_200000000-sr.bam /dev/stdin) --discordantFile >(/usr/local/share/bcbio/galaxy/../anaconda/bin/samtools sort -@ 16 -m 1G -T /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-disc -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmp31ole9/1_NT_Blood-sort-180000001_200000000-disc.bam /dev/stdin) | /usr/local/share/bcbio/galaxy/../anaconda/bin/samtools view -b -S -u - | /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba sort -N -t 16 -m 1G --tmpdir /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-full -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000.bam /dev/stdin [M::mem_pestat] skip orientation RF [M::mem_pestat] skip orientation RR [M::mem_pestat] analyzing insert size distribution for orientation FF... [M::mem_pestat] (25, 50, 75) percentile: (411, 586, 1713)

.....

[M::mem_pestat] skip orientation FF [M::mem_pestat] skip orientation RF [M::mem_pestat] skip orientation RR [fputs] Broken pipe [W::sam_read1] parse error at line 97572 [bam_sort_core] truncated file. Aborting. [E::sam_parse1] SEQ and QUAL are of different length [W::sam_read1] parse error at line 9942568 [main_samview] truncated file. ' returned non-zero exit status 1

PatrickJReed commented 7 years ago

Config file:

details:

PatrickJReed commented 7 years ago

Running on amazon m4.4xlarge instance system config:

cat /usr/local/share/bcbio/galaxy/bcbio_system.yaml galaxy_config: universe_wsgi.ini resources: default: cores: 16 jvm_opts:

chapmanb commented 7 years ago

Patrick; Thanks for the detailed report and apologies about the issues. That's an intense configuration file, I'll try to tackle one thing at a time. Regarding your problem:

Some other suggestions unrelated to this.

Hope this helps with your issue and general configuration.

PatrickJReed commented 7 years ago

Thanks for the reply, for context, the analysis is on monozygotic twins that are discordant for a complex disorder using tumor normal phenotype for affected and unaffected twins, so I'm really trying to get super accurate germline and somatic variant calls, there is WGS from blood and from fibroblast for both twins to further tease out germline vs somatic. I'll drop lumpy to see if samblaster is the culprit. Thank you for the config suggestions, i'll drop validate and recal/realign. I'll keep you posted.

PatrickJReed commented 7 years ago

removal of lumpy from the config fixed the problem.