Closed PatrickJReed closed 7 years ago
Config file:
details:
Running on amazon m4.4xlarge instance system config:
cat /usr/local/share/bcbio/galaxy/bcbio_system.yaml galaxy_config: universe_wsgi.ini resources: default: cores: 16 jvm_opts:
Patrick; Thanks for the detailed report and apologies about the issues. That's an intense configuration file, I'll try to tackle one thing at a time. Regarding your problem:
lumpy
from your svcaller
list. bcbio will then use a different duplicate marking approach which might work better.Some other suggestions unrelated to this.
I'm unclear if you're running a tumor/normal somatic analysis or a germline analysis. You have both somatic (mutect2, varscan) and germline (samtools, platypus, gatk-haplotype) in your configuration. You'd want to pick one or the other. I'd also recommend using vardict
if somatic calling and freebayes
if doing germline calling. If you're doing germline calling you don't need phenotype: normal
in your metadata.
Unless you're running a large number of samples (250+) you don't need jointcaller
and can remove that from the configuration.
You generally don't need recalibration and realignment and they add a lot of processing overhead. I'd suggest removing these.
If you're not running Genome in a Bottle test data (these look like sample data), the validate
targets don't help much since your samples won't be comparable.
Hope this helps with your issue and general configuration.
Thanks for the reply, for context, the analysis is on monozygotic twins that are discordant for a complex disorder using tumor normal phenotype for affected and unaffected twins, so I'm really trying to get super accurate germline and somatic variant calls, there is WGS from blood and from fibroblast for both twins to further tease out germline vs somatic. I'll drop lumpy to see if samblaster is the culprit. Thank you for the config suggestions, i'll drop validate and recal/realign. I'll keep you posted.
removal of lumpy from the config fixed the problem.
I've seen similar issues to this, but they're closed. I did update to current development version just before rerunning this and i get the same error. I will say upfront that it is "possible" that the file may be truncated somehow, lots of movement before analysis, local server -> S3 -> EBS -> bcbio prep merged.
[2016-11-22T15:00Z] System YAML configuration: /usr/local/share/bcbio/galaxy/bcbio_system.yaml [2016-11-22T15:00Z] Resource requests: bwa, sambamba, samtools; memory: 3.00, 3.00, 3.00; cores: 16, 16, 16 [2016-11-22T15:00Z] Configuring 1 jobs to run, using 16 cores each with 48.1g of memory reserved for each job [2016-11-22T15:00Z] Timing: organize samples [2016-11-22T15:00Z] multiprocessing: organize_samples [2016-11-22T15:00Z] Using input YAML configuration: /data/SZ_WGS_Meta-merged/config/SZ_WGS_Meta-merged.yaml [2016-11-22T15:00Z] Checking sample YAML configuration: /data/SZ_WGS_Meta-merged/config/SZ_WGS_Meta-merged.yaml [2016-11-22T15:00Z] Testing minimum versions of installed programs [2016-11-22T15:00Z] Timing: alignment preparation [2016-11-22T15:00Z] multiprocessing: prep_align_inputs [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] Resource requests: ; memory: 1.00; cores: 1 [2016-11-22T15:00Z] Configuring 2 jobs to run, using 1 cores each with 1.00g of memory reserved for each job [2016-11-22T15:00Z] multiprocessing: disambiguate_split [2016-11-22T15:00Z] Timing: alignment [2016-11-22T15:00Z] multiprocessing: process_alignment [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] Aligning lane 1_2016-11-21_SZ_WGS_Meta-merged with bwa aligner [2016-11-22T15:00Z] bwa mem alignment from fastq: 1_NT_Blood [2016-11-22T15:00Z] samblaster: Version 0.1.23 [2016-11-22T15:00Z] samblaster: Inputting from stdin [2016-11-22T15:00Z] samblaster: Outputting to stdout [2016-11-22T15:00Z] samblaster: Opening /dev/fd/62 for write. [2016-11-22T15:00Z] samblaster: Opening /dev/fd/63 for write. [2016-11-22T15:01Z] [M::mem_pestat] analyzing insert size distribution for orientation FF... [2016-11-22T15:01Z] [M::mem_pestat] (25, 50, 75) percentile: (444, 614, 2476) [2016-11-22T15:01Z] [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 6540) [2016-11-22T15:01Z] [M::mem_pestat] mean and std.dev: (1483.12, 1394.07) [2016-11-22T15:01Z] [M::mem_pestat] low and high boundaries for proper pairs: (1, 8572) [2016-11-22T15:01Z] [M::mem_pestat] analyzing insert size distribution for orientation FR... [2016-11-22T15:01Z] [M::mem_pestat] (25, 50, 75) percentile: (510, 594, 690) [2016-11-22T15:01Z] [M::mem_pestat] low and high boundaries for computing mean and std.dev: (150, 1050)
........
[2016-11-22T15:08Z] [M::mem_pestat] mean and std.dev: (2431.38, 1578.82) [2016-11-22T15:08Z] [M::mem_pestat] low and high boundaries for proper pairs: (1, 9898) [2016-11-22T15:08Z] [M::mem_pestat] skip orientation FF [2016-11-22T15:08Z] [M::mem_pestat] skip orientation RF [2016-11-22T15:08Z] [M::mem_pestat] skip orientation RR [2016-11-22T15:08Z] [fputs] Broken pipe [2016-11-22T15:08Z] [W::sam_read1] parse error at line 97572 [2016-11-22T15:08Z] [bam_sort_core] truncated file. Aborting. [2016-11-22T15:08Z] [E::sam_parse1] SEQ and QUAL are of different length [2016-11-22T15:08Z] [W::sam_read1] parse error at line 9942568 [2016-11-22T15:08Z] [main_samview] truncated file. [2016-11-22T15:09Z] Uncaught exception occurred Traceback (most recent call last): File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /usr/local/share/bcbio/galaxy/../anaconda/bin/bwa mem -c 250 -M -t 16 -R '@RG\tID:1_NT_Blood\tPL:illumina\tPU:1_2016-11-21_SZ_WGS_Meta-merged\tSM:1_NT_Blood' -v 1 /usr/local/share/bcbio/genomes/Hsapiens/GRCh37/bwa/GRCh37.fa <(grabix grab /data/SZ_WGS_Meta-merged/work/align_prep/1_NT_Blood_R1.fastq.gz 180000001 200000000) <(grabix grab /data/SZ_WGS_Meta-merged/work/align_prep/1_NT_Blood_R2.fastq.gz 180000001 200000000) | /usr/local/share/bcbio/galaxy/../anaconda/bin/samblaster --addMateTags -M --splitterFile >(/usr/local/share/bcbio/galaxy/../anaconda/bin/samtools sort -@ 16 -m 1G -T /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-spl -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpW3_m6g/1_NT_Blood-sort-180000001_200000000-sr.bam /dev/stdin) --discordantFile >(/usr/local/share/bcbio/galaxy/../anaconda/bin/samtools sort -@ 16 -m 1G -T /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-disc -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmp31ole9/1_NT_Blood-sort-180000001_200000000-disc.bam /dev/stdin) | /usr/local/share/bcbio/galaxy/../anaconda/bin/samtools view -b -S -u - | /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba sort -N -t 16 -m 1G --tmpdir /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-full -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000.bam /dev/stdin [M::mem_pestat] skip orientation RF [M::mem_pestat] skip orientation RR [M::mem_pestat] analyzing insert size distribution for orientation FF... [M::mem_pestat] (25, 50, 75) percentile: (411, 586, 1713) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 4317) [M::mem_pestat] mean and std.dev: (1136.36, 1187.83) [M::mem_pestat] low and high boundaries for proper pairs: (1, 5888) [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25, 50, 75) percentile: (510, 594, 690) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (150, 1050) [M::mem_pestat] mean and std.dev: (602.11, 140.91)
......
[M::mem_pestat] low and high boundaries for proper pairs: (1, 9898) [M::mem_pestat] skip orientation FF [M::mem_pestat] skip orientation RF [M::mem_pestat] skip orientation RR [fputs] Broken pipe [W::sam_read1] parse error at line 97572 [bam_sort_core] truncated file. Aborting. [E::sam_parse1] SEQ and QUAL are of different length [W::sam_read1] parse error at line 9942568 [main_samview] truncated file. ' returned non-zero exit status 1 Traceback (most recent call last): File "/usr/local/bin/bcbio_nextgen.py", line 230, in
main(kwargs)
File "/usr/local/bin/bcbio_nextgen.py", line 43, in main
run_main(kwargs)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 43, in run_main
fc_dir, run_info_yaml)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel
for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 127, in variant2pipeline
samples = run_parallel("process_alignment", samples)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
return run_multicore(fn, items, config, parallel=parallel)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items):
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 800, in call
while self.dispatch_one_batch(iterator):
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 658, in dispatch_one_batch
self._dispatch(tasks)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 566, in _dispatch
job = ImmediateComputeBatch(batch)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 180, in init
self.results = batch()
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 72, in call
return [func(*args, *kwargs) for func, args, kwargs in self.items]
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
return apply(f, args, *kwargs)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 80, in process_alignment
return sample.process_alignment(args)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/sample.py", line 113, in process_alignment
data = align_to_sort_bam(fastq1, fastq2, aligner, data)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/alignment.py", line 63, in align_to_sort_bam
names, align_dir, data)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/alignment.py", line 116, in _align_from_fastq
out = align_fn(fastq1, fastq2, align_ref, names, align_dir, data)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/bwa.py", line 146, in align_pipe
names, rg_info, data)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/bwa.py", line 156, in _align_mem
[do.file_nonempty(tx_out_file), do.file_reasonable_size(tx_out_file, fastq_file)])
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
_do_run(cmd, checks, log_stdout)
File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; /usr/local/share/bcbio/galaxy/../anaconda/bin/bwa mem -c 250 -M -t 16 -R '@RG\tID:1_NT_Blood\tPL:illumina\tPU:1_2016-11-21_SZ_WGS_Meta-merged\tSM:1_NT_Blood' -v 1 /usr/local/share/bcbio/genomes/Hsapiens/GRCh37/bwa/GRCh37.fa <(grabix grab /data/SZ_WGS_Meta-merged/work/align_prep/1_NT_Blood_R1.fastq.gz 180000001 200000000) <(grabix grab /data/SZ_WGS_Meta-merged/work/align_prep/1_NT_Blood_R2.fastq.gz 180000001 200000000) | /usr/local/share/bcbio/galaxy/../anaconda/bin/samblaster --addMateTags -M --splitterFile >(/usr/local/share/bcbio/galaxy/../anaconda/bin/samtools sort -@ 16 -m 1G -T /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-spl -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpW3_m6g/1_NT_Blood-sort-180000001_200000000-sr.bam /dev/stdin) --discordantFile >(/usr/local/share/bcbio/galaxy/../anaconda/bin/samtools sort -@ 16 -m 1G -T /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-disc -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmp31ole9/1_NT_Blood-sort-180000001_200000000-disc.bam /dev/stdin) | /usr/local/share/bcbio/galaxy/../anaconda/bin/samtools view -b -S -u - | /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba sort -N -t 16 -m 1G --tmpdir /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000-sorttmp-full -o /data/SZ_WGS_Meta-merged/work/align/1_NT_Blood/split/tx/tmpqIJLtg/1_NT_Blood-sort-180000001_200000000.bam /dev/stdin
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (411, 586, 1713)
.....
[M::mem_pestat] skip orientation FF [M::mem_pestat] skip orientation RF [M::mem_pestat] skip orientation RR [fputs] Broken pipe [W::sam_read1] parse error at line 97572 [bam_sort_core] truncated file. Aborting. [E::sam_parse1] SEQ and QUAL are of different length [W::sam_read1] parse error at line 9942568 [main_samview] truncated file. ' returned non-zero exit status 1