Closed mjafin closed 10 years ago
I can actually see core dumps following these errors. Maybe it's something with our system?
Miika; Thanks for the report. Are you seeing this with the NA12878 whole genome, or exome example? The underlying cause appears to be a bedtools failure. I wasn't able to reproduce with the whole genome example on my system after a quick test. What version of bedtools do you have? I tested with v2.17.0.
I do see some other errors from VarScan processing due to VarScan producing non-GATC alleles that make GATK unhappy. I'll work on fixing those but this appears unrelated to your problem, so I'd like to narrow that down.
Thanks Brad, running it now in local mode with 2 CPUs only and it's chugging along. Here are the settings I'm using
upload:
dir: ../final
details:
- files: [/ngs/public_data/NA12878/NA12878-NGv3-LAB1360-A_1.fastq.gz, /ngs/public_data/NA12878/NA12878-NGv3-LAB1360-A_2.fastq.gz]
description: NA12878-1
analysis: variant2
genome_build: GRCh37
algorithm:
aligner: bwa
align_split_size: 5000000
variantcaller: [gatk, freebayes, varscan, gatk-haplotype]
quality_format: Standard
coverage_interval: regional
merge_bamprep: true
validate: /ngs/public_data/NA12878/NISTIntegratedCalls_13datasets_130719_allcall_UGHapMerge_HetHomVarPASS_VQSRv2.17_all_nouncert_excludesimplerep_excludesegdups_excludedecoy_excludeRepSeqSTRs_noCNVs.vcf
validate_regions: /ngs/public_data/NA12878/union13callableMQonlymerged_addcert_nouncert_excludesimplerep_excludesegdups_excludedecoy_excludeRepSeqSTRs_noCNVs_v2.17.bed
variant_regions: /ngs/public_data/NA12878/NGv3.bed
ensemble:
format-filters: [DP < 4]
classifiers:
balance: [AD, FS, Entropy]
calling: [ReadPosEndDist, PL, PLratio, Entropy, NBQ]
classifier-params:
type: svm
trusted-pct: 0.5
- files: [/ngs/public_data/NA12878/NA12878-NGv3-LAB1360-A_1.fastq.gz, /ngs/public_data/NA12878/NA12878-NGv3-LAB1360-A_2.fastq.gz]
description: NA12878-2
analysis: variant2
genome_build: GRCh37
algorithm:
aligner: bwa
align_split_size: 5000000
mark_duplicates: samtools
recalibrate: false
realign: false
variantcaller: [gatk, freebayes, varscan, gatk-haplotype]
quality_format: Standard
coverage_interval: regional
merge_bamprep: false
validate: /ngs/public_data/NA12878/NISTIntegratedCalls_13datasets_130719_allcall_UGHapMerge_HetHomVarPASS_VQSRv2.17_all_nouncert_excludesimplerep_excludesegdups_excludedecoy_excludeRepSeqSTRs_noCNVs.vcf
validate_regions: /ngs/public_data/NA12878/union13callableMQonlymerged_addcert_nouncert_excludesimplerep_excludesegdups_excludedecoy_excludeRepSeqSTRs_noCNVs_v2.17.bed
variant_regions: /ngs/public_data/NA12878/NGv3.bed
ensemble:
format-filters: [DP < 4]
classifiers:
balance: [AD, FS, Entropy]
calling: [ReadPosEndDist, PL, PLratio, Entropy, NBQ]
classifier-params:
type: svm
trusted-pct: 0.5
I was previously running using 8 cpus and SGE. I suspect it may have something to do with maximum user processes but not sure (both file handles and max user processes have been bumped up quite a bit some time ago so not sure). If the local run finishes fine I'll restart in the queues to see if it's reproducible.
Bedtools is v2.17.0.
Miika; Ah, gotcha. That makes sense. I just fixed some problems with leaking file handles this morning so it definitely could be that. I found a place where we were not closing a ZeroMQ context which led to extra connections being held. If it works fine without IPython this is probably the issue. I'm hoping to roll the final 0.7.5 release with these fixes tomorrow after closing remaining issues. Sorry to make you chase down the problem.
Thanks Brad! I'll ask Jakub to upgrade to 0.7.5 final when it's out and will continue stress testing our setup. I'll let you know if I encounter something like this again
Happy Thanksgiving!
One more (still pre 0.7.5):
[2013-11-30 00:13] Genotyping with varscan: ('14', 66734393, 68290593) NA12878-2-sort-14_66734393_68290593-prep.bam
[2013-11-30 00:13] mpileup for Varscan
[2013-11-30 00:13] [mpileup] 1 samples in 1 input files
[2013-11-30 00:13] <mpileup> Set max per-file depth to 8000
[2013-11-30 00:13] Varscan
[2013-11-30 00:13] cat: write error: Broken pipe
[2013-11-30 00:13] Uncaught exception occurred
Traceback (most recent call last):
File "/apps/bcbio-nextgen/0.7.5a/rhel6-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 22, in run
_do_run(cmd, checks)
File "/apps/bcbio-nextgen/0.7.5a/rhel6-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 114, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; cat /ngs/RDI/Analysis/NA12878_validation/work2/varscan/14/tx/tmpUXn00t/NA12878-2-sort-14_66734393_68290593-prep-variants-raw.mpileup | java -Xms750m -Xmx8g -Duser.language=en -Duser.country=US -jar /apps/bcbio-nextgen/0.7.5a/rhel6-x64/share/java/varscan/VarScan.v2.3.6.jar mpileup2cns --min-coverage 5 --p-value 0.98 --vcf-sample-list /ngs/RDI/Analysis/NA12878_validation/work2/varscan/14/tx/tmpUXn00t/NA12878-2-sort-14_66734393_68290593-prep-variants-raw-sample_list.txt --output-vcf --variants > /ngs/RDI/Analysis/NA12878_validation/work2/varscan/14/tx/tmpUXn00t/NA12878-2-sort-14_66734393_68290593-prep-variants-raw.vcf
cat: write error: Broken pipe
Miika; I hope that the file handle fixes in 0.7.5 will remove this problem as well. A lot of these error types were due to being unable to get an open file handle, so they manifest in confusing ways. If you still have problems with the latest versions please do re-open and we can look into it more.
I noticed a few problems when including VarScan in calling variants in the NA12878 example. The errors look like this:
In the debug log here's what's around the error: