bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Bus error #1792

Closed pengxiao78 closed 7 years ago

pengxiao78 commented 7 years ago

Hi Brad,

I met the follow bus error, which stopped the bcbio running. 2017-01-26T00:45Z] c2819..edu: Timing: hla typing [2017-01-26T00:47Z] c2819..edu: Timing: alignment post-processing [2017-01-26T00:47Z] c2819..edu: ipython: piped_bamprep [2017-01-26T00:48Z] c2819..edu: Timing: variant calling [2017-01-26T00:49Z] c2819..edu: ipython: variantcall_sample [2017-01-27T01:32Z] c2825..edu: Uncaught exception occurred Traceback (most recent call last): File "/path/to/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/path/to/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /path/to/bcbio/anaconda/bin/freebayes -f /path/to/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa --genotype-qualities --strict-vcf --ploidy 2 --targets /path/to/work/freebayes/9/13657-9_0_141213431-regions-nolcr.bed --min-repeat-entropy 1 --no-partial-observations --min-alternate-fraction 0.1 --pooled-discrete --pooled-continuous --report-genotype-likelihood-max --allele-balance-priors-off /path/to/work/bamprep/13657T/9/13657T-sort-9_0_141213431-prep.bam /path/to/work/bamprep/13657N/9/13657N-sort-9_0_141213431-prep.bam | bcftools filter -i 'ALT="<>" || QUAL > 5' | /path/to/bcbio/anaconda/bin/py -x 'bcbio.variation.freebayes.call_somatic(x)' | awk -F$'\t' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $4) } {print}' | bcftools annotate -x FMT/DPR | bcftools view -a - | /path/to/bcbio/anaconda/bin/py -x 'bcbio.variation.freebayes.remove_missingalt(x)' | vcfallelicprimitives -t DECOMPOSED --keep-geno | vcffixup - | vcfstreamsort | vt normalize -n -r /path/to/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -q - | vcfuniqalleles | bgzip -c > /path/to/work/freebayes/9/tx/tmpn5ZPOD/13657-9_0_141213431.vcf.gz /bin/bash: line 1: 24258 Bus error /path/to/bcbio/anaconda/bin/freebayes -f /path/to/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa --genotype-qualities --strict-vcf --ploidy 2 --targets /path/to/work/freebayes/9/13657-9_0_141213431-regions-nolcr.bed --min-repeat-entropy 1 --no-partial-observations --min-alternate-fraction 0.1 --pooled-discrete --pooled-continuous --report-genotype-likelihood-max --allele-balance-priors-off /path/to/work/bamprep/13657T/9/13657T-sort-9_0_141213431-prep.bam /path/to/work/bamprep/13657N/9/13657N-sort-9_0_141213431-prep.bam 24259 Done | bcftools filter -i 'ALT="<>" || QUAL > 5' 24260 Done | /path/to/bcbio/anaconda/bin/py -x 'bcbio.variation.freebayes.call_somatic(x)' 24261 Done | awk -F' ' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $4) } {print}' 24262 Done | bcftools annotate -x FMT/DPR 24263 Done | bcftools view -a - 24264 Done | /path/to/bcbio/anaconda/bin/py -x 'bcbio.variation.freebayes.remove_missingalt(x)' 24265 Done | vcfallelicprimitives -t DECOMPOSED --keep-geno 24266 Done | vcffixup - 24267 Done | vcfstreamsort 24268 Done | vt normalize -n -r /path/to/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -q - 24269 Done | vcfuniqalleles 24270 Done | bgzip -c > /path/to/work/freebayes/9/tx/tmpn5ZPOD/13657-9_0_141213431.vcf.gz ' returned non-zero exit status 135

However, when I re-run the pipeline, and the program just stuck in the following stage

[2017-01-27T16:06Z] c3113..edu: Timing: hla typing [2017-01-27T16:06Z] c3113..edu: Resource requests: freebayes, gatk, mutect, picard, vardict, varscan; memory: 3.00, 6.00, 3.00, 3.00, 3.00; cores: 16, 1, 1, 16, 16, 16 [2017-01-27T16:06Z] c3113..edu: Configuring 144 jobs to run, using 1 cores each with 6.00g of memory reserved for each job 2017-01-27 10:06:39.904 [IPClusterStart] Using existing profile dir: u'/path/to/work/log/ipython' 2017-01-27 10:06:39.906 [IPClusterStart] Searching path [u'/path/to/work', u'/path/to/work/log/ipython', '/usr/local/etc/ipython', '/etc/ipython'] for config files 2017-01-27 10:06:39.907 [IPClusterStart] Attempting to load config file: ipython_config.py 2017-01-27 10:06:39.907 [IPClusterStart] Looking for ipython_config in /etc/ipython 2017-01-27 10:06:39.907 [IPClusterStart] Looking for ipython_config in /usr/local/etc/ipython 2017-01-27 10:06:39.908 [IPClusterStart] Looking for ipython_config in /path/to/work/log/ipython 2017-01-27 10:06:39.910 [IPClusterStart] Loaded config file: /path/to/work/log/ipython/ipython_config.py 2017-01-27 10:06:39.911 [IPClusterStart] Looking for ipython_config in /path/to/work 2017-01-27 10:06:39.914 [IPClusterStart] Attempting to load config file: ipcluster_3f99cb5c_aede_4912_8664_c2a7039e79ef_config.py 2017-01-27 10:06:39.914 [IPClusterStart] Looking for ipcontroller_config in /etc/ipython 2017-01-27 10:06:39.914 [IPClusterStart] Looking for ipcontroller_config in /usr/local/etc/ipython 2017-01-27 10:06:39.915 [IPClusterStart] Looking for ipcontroller_config in /path/to/work/log/ipython 2017-01-27 10:06:39.916 [IPClusterStart] Loaded config file: /path/to/work/log/ipython/ipcontroller_config.py 2017-01-27 10:06:39.917 [IPClusterStart] Looking for ipcontroller_config in /path/to/work 2017-01-27 10:06:39.919 [IPClusterStart] Attempting to load config file: ipcluster_3f99cb5c_aede_4912_8664_c2a7039e79ef_config.py 2017-01-27 10:06:39.919 [IPClusterStart] Looking for ipengine_config in /etc/ipython 2017-01-27 10:06:39.920 [IPClusterStart] Looking for ipengine_config in /usr/local/etc/ipython 2017-01-27 10:06:39.920 [IPClusterStart] Looking for ipengine_config in /path/to/work/log/ipython 2017-01-27 10:06:39.922 [IPClusterStart] Loaded config file: /path/to/work/log/ipython/ipengine_config.py 2017-01-27 10:06:39.924 [IPClusterStart] Looking for ipengine_config in /path/to/work 2017-01-27 10:06:39.926 [IPClusterStart] Attempting to load config file: ipcluster_3f99cb5c_aede_4912_8664_c2a7039e79ef_config.py 2017-01-27 10:06:39.926 [IPClusterStart] Looking for ipcluster_config in /etc/ipython 2017-01-27 10:06:39.927 [IPClusterStart] Looking for ipcluster_config in /usr/local/etc/ipython 2017-01-27 10:06:39.927 [IPClusterStart] Looking for ipcluster_config in /path/to/work/log/ipython 2017-01-27 10:06:39.929 [IPClusterStart] Loaded config file: /path/to/work/log/ipython/ipcluster_config.py 2017-01-27 10:06:39.930 [IPClusterStart] Looking for ipcluster_config in /path/to/work [2017-01-27T16:20Z] c3113..edu: Timing: alignment post-processing [2017-01-27T16:21Z] c3113..edu: ipython: piped_bamprep [2017-01-27T16:21Z] c3113..edu: Timing: variant calling [2017-01-27T16:22Z] c3113..edu: ipython: variantcall_sample [2017-01-27T16:22Z] c2923..edu: Genotyping paired variants with FreeBayes [2017-01-27T16:23Z] c2923.***.edu: Genotyping paired variants with FreeBayes

Could you help me with the debugging? Thanks!

chapmanb commented 7 years ago

Sorry about the issue. The original Bus error indicates some kind of system/filesystem error so restarting the pipeline is the right thing to do.

In the restart, the log looks like bcbio is running two FreeBayes jobs. Is it stuck, or are those jobs currently processing if you go to the c2923 machine that is running them? It looks like those just might be compute/memory intensive jobs that caused the initial failure and need to be re-run. Hopefully waiting will allow them to finish?

What version of bcbio are you running? It looks like your might have a version with a bug in calculating parallel regions since you have large regions on that failing process. The latest release 1.0.1 should avoid this problem. Hope this helps.