bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Error on somatic calling on calculating regions coverage of variant_regions #1959

Closed bioinfo-dirty-jobs closed 7 years ago

bioinfo-dirty-jobs commented 7 years ago

I have centos 7 and I have this error


2017-05-26T16:09Z] INFO  16:09:01,462 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:09Z] INFO  16:09:01,486 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:09Z] INFO  16:09:15,824 ProgressMeter -  chr13:45517708   3.9106296E7    17.4 m      26.0 s       64.6%    26.9 m       9.5 m
[2017-05-26T16:09Z] INFO  16:09:30,849 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:09Z] INFO  16:09:30,878 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:09Z] INFO  16:09:46,777 ProgressMeter -  chr14:36207782   4.0259022E7    17.9 m      26.0 s       66.5%    26.9 m       9.0 m
[2017-05-26T16:10Z] INFO  16:10:17,864 ProgressMeter -  chr14:95909672   4.1559047E7    18.4 m      26.0 s       68.3%    27.0 m       8.6 m
[2017-05-26T16:10Z] INFO  16:10:24,460 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:10Z] INFO  16:10:24,488 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:10Z] INFO  16:10:48,175 ProgressMeter -  chr15:43029003   4.2535508E7    18.9 m      26.0 s       69.9%    27.0 m       8.1 m
[2017-05-26T16:11Z] INFO  16:11:19,899 ProgressMeter -  chr15:83222213   4.3635526E7    19.5 m      26.0 s       71.9%    27.1 m       7.6 m
[2017-05-26T16:11Z] INFO  16:11:28,295 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:11Z] INFO  16:11:28,327 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:11Z] INFO  16:11:49,921 ProgressMeter -  chr16:16271344   4.4720776E7    20.0 m      26.0 s       73.7%    27.1 m       7.1 m
[2017-05-26T16:12Z] INFO  16:12:21,870 ProgressMeter -  chr16:70817424   4.5820802E7    20.5 m      26.0 s       76.0%    27.0 m       6.5 m
[2017-05-26T16:12Z] INFO  16:12:31,290 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:12Z] INFO  16:12:31,319 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:12Z] INFO  16:12:52,037 ProgressMeter -  chr17:10356471   4.6917355E7    21.0 m      26.0 s       77.9%    26.9 m       5.9 m
[2017-05-26T16:13Z] INFO  16:13:22,095 ProgressMeter -  chr17:41894047   4.8117381E7    21.5 m      26.0 s       80.0%    26.9 m       5.4 m
[2017-05-26T16:13Z] INFO  16:13:52,148 ProgressMeter -  chr17:78349691   4.9317395E7    22.0 m      26.0 s       82.1%    26.8 m       4.8 m
[2017-05-26T16:13Z] INFO  16:13:53,817 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:13Z] INFO  16:13:53,840 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:14Z] INFO  16:14:19,886 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:14Z] INFO  16:14:19,916 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:14Z] INFO  16:14:23,168 ProgressMeter -  chr18:78005267   5.0521268E7    22.5 m      26.0 s       84.0%    26.8 m       4.3 m
[2017-05-26T16:14Z] INFO  16:14:53,861 ProgressMeter -  chr19:21358262   5.1677349E7    23.0 m      26.0 s       86.7%    26.5 m       3.5 m
[2017-05-26T16:15Z] INFO  16:15:24,516 ProgressMeter -  chr19:49120018   5.2777369E7    23.5 m      26.0 s       88.8%    26.5 m       3.0 m
[2017-05-26T16:15Z] INFO  16:15:43,802 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:15Z] INFO  16:15:43,822 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:15Z] INFO  16:15:54,527 ProgressMeter -  chr20:21695546   5.3885704E7    24.0 m      26.0 s       90.9%    26.4 m       2.4 m
[2017-05-26T16:16Z] INFO  16:16:14,828 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:16Z] INFO  16:16:14,875 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:16Z] INFO  16:16:24,544 ProgressMeter -  chr21:37747571   5.5070216E7    24.5 m      26.0 s       93.0%    26.4 m     110.0 s
[2017-05-26T16:16Z] INFO  16:16:29,895 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:16Z] INFO  16:16:29,918 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:16Z] INFO  16:16:54,558 ProgressMeter -  chr22:40697339   5.6158039E7    25.0 m      26.0 s       95.1%    26.3 m      77.0 s
[2017-05-26T16:17Z] INFO  16:17:01,862 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:17Z] INFO  16:17:01,920 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:17Z] INFO  16:17:26,732 ProgressMeter -   chrX:49070661   5.7388936E7    25.6 m      26.0 s       97.0%    26.4 m      48.0 s
[2017-05-26T16:17Z] INFO  16:17:57,123 ProgressMeter -  chrX:129060095   5.8588967E7    26.1 m      26.0 s       98.8%    26.4 m      19.0 s
[2017-05-26T16:18Z] INFO  16:18:05,451 ReadShardBalancer$1 - Loading BAM index data
[2017-05-26T16:18Z] INFO  16:18:05,459 ReadShardBalancer$1 - Done loading BAM index data
[2017-05-26T16:18Z] INFO  16:18:06,868 BaseRecalibrator - Calculating quantized quality scores...
[2017-05-26T16:18Z] INFO  16:18:06,913 BaseRecalibrator - Writing recalibration report...
[2017-05-26T16:18Z] INFO  16:18:07,173 BaseRecalibrator - ...done!
[2017-05-26T16:18Z] INFO  16:18:07,174 BaseRecalibrator - BaseRecalibrator was able to recalibrate 59079413 reads
[2017-05-26T16:18Z] INFO  16:18:07,175 ProgressMeter -            done   5.9079413E7    26.2 m      26.0 s       99.9%    26.3 m       1.0 s
[2017-05-26T16:18Z] INFO  16:18:07,176 ProgressMeter - Total runtime 1574.64 secs, 26.24 min, 0.44 hours
[2017-05-26T16:18Z] INFO  16:18:07,176 MicroScheduler - 16671103 reads were filtered out during the traversal out of approximately 75757948 total reads (22.01%)
[2017-05-26T16:18Z] INFO  16:18:07,176 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter
[2017-05-26T16:18Z] INFO  16:18:07,176 MicroScheduler -   -> 9254973 reads (12.22% of total) failing DuplicateReadFilter
[2017-05-26T16:18Z] INFO  16:18:07,176 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
[2017-05-26T16:18Z] INFO  16:18:07,177 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter
[2017-05-26T16:18Z] INFO  16:18:07,177 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
[2017-05-26T16:18Z] INFO  16:18:07,177 MicroScheduler -   -> 7416130 reads (9.79% of total) failing MappingQualityZeroFilter
[2017-05-26T16:18Z] INFO  16:18:07,177 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
[2017-05-26T16:18Z] INFO  16:18:07,177 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter
[2017-05-26T16:18Z] ------------------------------------------------------------------------------------------
[2017-05-26T16:18Z] Done. There were 1 WARN messages, the first 1 are repeated below.
[2017-05-26T16:18Z] WARN  15:51:52,025 IndexDictionaryUtils - Track knownSites doesn't have a sequence dictionary built in, skipping dictionary validation
[2017-05-26T16:18Z] ------------------------------------------------------------------------------------------
[2017-05-26T16:18Z]calculating regions coverage of variant_regions in /home/centos/Calling/411/work/align/411-tumor/411-tumor-sort.bam
[2017-05-26T16:18Z] /usr/bin/bash: line 1: 32526 Segmentation fault      /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba depth region -t 8 /home/centos/Calling/411/work/align/411-tumor/411-tumor-sort.bam -L /home/centos/Calling/411/work/bedprep/truseq-exome-targeted-regions-manifest-v1-2-merged.bed -F "not unmapped and not mate_is_unmapped and not secondary_alignment and not failed_quality_control and not duplicate" -o /home/centos/Calling/411/work/bcbiotx/tmpcNhVjT/variant_regions_regions_depth.bed
[2017-05-26T16:18Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 22, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 102, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba depth region -t 8 /home/centos/Calling/411/work/align/411-tumor/411-tumor-sort.bam  -L /home/centos/Calling/411/work/bedprep/truseq-exome-targeted-regions-manifest-v1-2-merged.bed   -F "not unmapped and not mate_is_unmapped and not secondary_alignment and not failed_quality_control and not duplicate" -o /home/centos/Calling/411/work/bcbiotx/tmpcNhVjT/variant_regions_regions_depth.bed
/usr/bin/bash: line 1: 32526 Segmentation fault      /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba depth region -t 8 /home/centos/Calling/411/work/align/411-tumor/411-tumor-sort.bam -L /home/centos/Calling/411/work/bedprep/truseq-exome-targeted-regions-manifest-v1-2-merged.bed -F "not unmapped and not mate_is_unmapped and not secondary_alignment and not failed_quality_control and not duplicate" -o /home/centos/Calling/411/work/bcbiotx/tmpcNhVjT/variant_regions_regions_depth.bed
' returned non-zero exit status 139
Traceback (most recent call last):
  File "/usr/local/bin/bcbio_nextgen.py", line 234, in <module>
    main(**kwargs)
  File "/usr/local/bin/bcbio_nextgen.py", line 43, in main
    run_main(**kwargs)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 42, in run_main
    fc_dir, run_info_yaml)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 86, in _run_toplevel
    for xs in pipeline(config, run_info_yaml, parallel, dirs, samples):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 133, in variant2pipeline
    samples = run_parallel("postprocess_alignment", samples)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"], batch_size=1)(joblib.delayed(fn)(x) for x in items):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 804, in __call__
    while self.dispatch_one_batch(iterator):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 662, in dispatch_one_batch
    self._dispatch(tasks)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 570, in _dispatch
    job = ImmediateComputeBatch(batch)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 183, in __init__
    self.results = batch()
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 51, in wrapper
    return apply(f, *args, **kwargs)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 118, in postprocess_alignment
    return sample.postprocess_alignment(*args)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/sample.py", line 225, in postprocess_alignment
    covinfo = callable.sample_callable_bed(bam_file_ready, ref_file, data)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/bam/callable.py", line 36, in sample_callable_bed
    depth_bed, callable_bed, highdepth_bed, variant_regions_avg_cov = coverage.calculate(bam_file, data)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/coverage.py", line 85, in calculate
    variant_regions_avg_cov = get_average_coverage(data, bam_file, variant_regions, "variant_regions")
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/coverage.py", line 173, in get_average_coverage
    avg_cov = _average_bed_coverage(data, bed_file, bam_file, target_name=target_name)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/coverage.py", line 190, in _average_bed_coverage
    sambamba_depth_file = regions_coverage(data, bed_file, bam_file, target_name)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/variation/coverage.py", line 334, in regions_coverage
    do.run(cmdl, message.format(**locals()))
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 22, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 102, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba depth region -t 8 /home/centos/Calling/411/work/align/411-tumor/411-tumor-sort.bam  -L /home/centos/Calling/411/work/bedprep/truseq-exome-targeted-regions-manifest-v1-2-merged.bed   -F "not unmapped and not mate_is_unmapped and not secondary_alignment and not failed_quality_control and not duplicate" -o /home/centos/Calling/411/work/bcbiotx/tmpcNhVjT/variant_regions_regions_depth.bed
/usr/bin/bash: line 1: 32526 Segmentation fault      /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba depth region -t 8 /home/centos/Calling/411/work/align/411-tumor/411-tumor-sort.bam -L /home/centos/Calling/411/work/bedprep/truseq-exome-targeted-regions-manifest-v1-2-merged.bed -F "not unmapped and not mate_is_unmapped and not secondary_alignment and not failed_quality_control and not duplicate" -o /home/centos/Calling/411/work/bcbiotx/tmpcNhVjT/variant_regions_regions_depth.bed
' returned non-zero exit status 139

This is the versin I use:

cat  provenance/programs.txt 
bamtools,2.4.0
bcbio-nextgen,1.0.3
bcbio-variation,0.2.6
bcftools,1.4.1
bedtools,2.26.0
biobambam,2.0.72
bioconductor-bubbletree,2.1.5
bowtie2,2.2.8
bwa,0.7.15
chanjo,
cnvkit,0.8.5
cufflinks,2.2.1
cutadapt,1.13
fastqc,0.11.5
featurecounts,1.4.4
freebayes,1.1.0
gatk,3.7
gatk-framework,3.6.24
gemini,0.20.0
grabix,0.1.8
hisat2,2.0.5
htseq,0.7.2
lumpy-sv,0.2.13
manta,1.1.0
metasv,0.4.0
mirdeep2,2.0.0.7
mutect,1.1.5
novoalign,3.07.00
novosort,V3.00.02
oncofuse,1.1.0
phylowgs,20150714
picard,2.9.2
platypus-variant,0.8.1
qualimap,2.2.2a
rna-star,
rtg-tools,3.7.1
sailfish,0.10.1
salmon,0.8.2
sambamba,0.6.6
samblaster,0.1.24
samtools,1.4.1
scalpel,0.5.3
seqbuster,3.1
snpeff,4.3i
vardict,2017.04.18
vardict-java,1.5.0
variant-effect-predictor,87
varscan,2.4.2
vcflib,1.0.0_rc1
vt,2015.11.10
wham,1.7.0.307

[centos@pol-produ work]$ cat provenance/data_versions.csv 
genome,resource,version
GRCh37,GA4GH_problem_regions,20160916
GRCh37,capture_regions,20161202
GRCh37,MIG,20150730
GRCh37,prioritize,20160215
GRCh37,dbsnp,150-20170403
GRCh37,hapmap,3.3
GRCh37,1000g_omni_snps,2.5
GRCh37,ACMG56_genes,20160810
GRCh37,1000g_snps,2.8
GRCh37,mills_indels,2.8
GRCh37,clinvar,20160502
GRCh37,cosmic,68
GRCh37,ancestral,20141010
GRCh37,qsignature,20140703
GRCh37,genesplicer,2004.04.03
GRCh37,effects_transcript,2017-02-22
GRCh37,vcfanno,20170522
GRCh37,viral,2017.02.04
GRCh37,transcripts,2015-12-01
GRCh37,RADAR,5
GRCh37,srnaseq,20170517
GRCh37,giab-NA12878,v3_3_2
GRCh37,giab-NA24385,v3_3_2-sv_v0.1.8
GRCh37,giab-NA24631,v3_3_2
GRCh37,dream-syn3,2014-08-04
GRCh37,dream-syn4,2016-06-11
hg19,GA4GH_problem_regions,20160916
hg19,capture_regions,20161202
hg19,MIG,20150730
hg19,prioritize,20160215
hg19,dbsnp,150-20170403
hg19,hapmap,3.3
hg19,1000g_omni_snps,2.5
hg19,ACMG56_genes,20160629
hg19,1000g_snps,2.8
hg19,mills_indels,2.8
hg19,clinvar,20160502
hg19,cosmic,68
hg19,ancestral,20141010
hg19,qsignature,20140703
hg19,genesplicer,2004.04.03
hg19,effects_transcript,2017-02-22
hg19,vcfanno,20170522
hg19,viral,2017.02.04
hg19,transcripts,2014-07-17
hg19,RADAR,4
hg19,srnaseq,20170517
hg19,giab-NA12878,v3_3_2
hg19,platinum-genome-NA12878,v8_0_1
hg19,giab-NA24385,v3_3_2-sv_v0.1.8
hg19,giab-NA24631,v3_3_2
412-normal: Assigned coverage as 'regional' with 1.4% genome coverage and 21.2% offtarget coverage
[2017-05-26T15:51Z] Recalibrating 412-normal with GATK
[2017-05-26T16:18Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 22, in run
    _do_run(cmd, checks, log_stdout, env=env)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 102, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba depth region -t 8 /home/centos/Calling/411/work/align/411-tumor/411-tumor-sort.bam  -L /home/centos/Calling/411/work/bedprep/truseq-exome-targeted-regions-manifest-v1-2-merged.bed   -F "not unmapped and not mate_is_unmapped and not secondary_alignment and not failed_quality_control and not dupl:
chapmanb commented 7 years ago

Thanks for the report and sorry about the problem. sambamba has been having some unpredictable segfaults on some systems. We've been working on debugging the underlying issue and are also transitioning over to use samtools multithreaded options where they work (#1935). In this case sambamba has a unique depth calculation so I tried to work around it by falling back to a single threaded run. Hope this fixes things for you.