Closed mjafin closed 10 years ago
Miika;
Apologies, we'd torn out max_errors
but it looks like there were still some references in the docs and code that could cause confusion.
In your case, my guess is that you have an older bcbio_system.yaml
file with an algorithm
section. We also removed that, because it caused more confusion than helped. But if you installed this from an older version it might still be there and have a max_errors
parameter in it.
So the short answer is: go to your bcbio_system.yaml
file and delete the algorithm
section, which should fix it.
Hi Brad,
That's odd, we don't actually have algorithm
in bcbio_system.yaml
and I'm only seeing this error on SE reads, not with PE data. But I'll look into patching our version with your fix..
Cheers, Miika
Miika;
Is max_errors
in your input sample YAML file? The error message just tells you that you've got this parameter somewhere in your input configuration and it's no longer supported. The fix doesn't address this directly but is mainly cleaning up the half-finished cleanup of removing this parameter.
As far as I can tell, max_errors
is in neither the global yaml or the sample yaml.. I'll look deeper into this
The fixes helped me get past the alignment, but then I get the following error:
[2014-03-11 10:06] [bwa_aln_core] print alignments... 0.15 sec
[2014-03-11 10:06] [bwa_aln_core] 1636447 sequences have been processed.
[2014-03-11 10:06] [main] Version: 0.7.7-r441
[2014-03-11 10:06] Convert SAM to BAM (8 cores): /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sam to /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.bam
[2014-03-11 10:06] [samopen] SAM header is present: 93 sequences.
[2014-03-11 10:06] Sort BAM file (multi core, coordinate): 1_2014-03-11_OvationSureSelect_test.bam to 1_2014-03-11_OvationSureSelect_test.sorted.bam
[2014-03-11 10:06] Index BAM file: 1_2014-03-11_OvationSureSelect_test.sorted.bam
[2014-03-11 10:06] Aligning lane 2_2014-03-11_OvationSureSelect_test with bwa aligner
[2014-03-11 10:06] bwa mem alignment from fastq: OvationFresh
[2014-03-11 10:06] [bam_header_read] EOF marker is absent. The input is probably truncated.
[2014-03-11 10:07] [samopen] SAM header is present: 93 sequences.
[2014-03-11 10:08] [main] Version: 0.7.7-r441
[2014-03-11 10:08] [main] CMD: /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa mem -M -t 8 -R @RG\tID:2\tPL:illumina\tPU:2_2014-03-11_OvationSureSelect_test\tSM:OvationFresh -v 1 /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFresh.fq
[2014-03-11 10:08] [main] Real time: 71.286 sec; CPU: 224.862 sec
[2014-03-11 10:08] Index BAM file: 2_2014-03-11_OvationSureSelect_test-sort.bam
[2014-03-11 10:08] Timing: callable regions
[2014-03-11 10:08] multiprocessing: postprocess_alignment
[2014-03-11 10:08] Prepare cleaned BED file : OvationFFPE
[2014-03-11 10:08] bgzip cancerPanel_target.bed
[2014-03-11 10:08] tabix index cancerPanel_target.bed.gz
[2014-03-11 10:08] Resource requests: ; memory: 1.0; cores: 1
[2014-03-11 10:08] Configuring 1 jobs to run, using 1 cores each with 1.2g of memory reserved for each job
[2014-03-11 10:08] multiprocessing: calc_callable_loci
[2014-03-11 10:08] multiprocessing: combine_bed
[2014-03-11 10:08] Resource requests: ; memory: 1.0; cores: 1
[2014-03-11 10:08] Configuring 1 jobs to run, using 1 cores each with 1.2g of memory reserved for each job
[2014-03-11 10:08] multiprocessing: calc_callable_loci
[2014-03-11 10:09] multiprocessing: combine_bed
[2014-03-11 10:09] multiprocessing: combine_sample_regions
[2014-03-11 10:09] Identified 338 parallel analysis blocks
Block sizes:
min: 4262
5%: 39907.55
25%: 2027585.25
median: 5025856.5
75%: 10965578.0
95%: 33460817.55
99%: 55035159.86
max: 67327613
Between block sizes:
min: 46
5%: 173.4
25%: 1078.0
median: 2785.0
75%: 11361.0
95%: 114915.6
99%: 11190177.96
max: 29031551
[2014-03-11 10:09] Timing: coverage
[2014-03-11 10:09] Resource requests: freebayes, gatk, gatk-haplotype, picard; memory: 16.0, 16.0, 16.0, 16.0; cores: 8, 1, 1, 1
[2014-03-11 10:09] Configuring 2 jobs to run, using 1 cores each with 16.2g of memory reserved for each job
[2014-03-11 10:09] Timing: alignment post-processing
[2014-03-11 10:09] multiprocessing: piped_bamprep
[2014-03-11 10:09] Piped post-alignment bamprep ('chrM', 0, 16571) : OvationFFPE : ('chrM', 0, 16571)
[2014-03-11 10:09] Piped post-alignment bamprep ('chr1', 0, 1982140) : OvationFFPE : ('chr1', 0, 1982140)
[2014-03-11 10:09] ##### ERROR ------------------------------------------------------------------------------------------
[2014-03-11 10:09] ##### ERROR A USER ERROR has occurred (version 3.0-7-gac6f69f):
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
[2014-03-11 10:09] ##### ERROR The error message below tells you what is the problem.
[2014-03-11 10:09] ##### ERROR ------------------------------------------------------------------------------------------
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR If the problem is an invalid argument, please check the online documentation guide
[2014-03-11 10:09] ##### ERROR A USER ERROR has occurred (version 3.0-7-gac6f69f):
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
[2014-03-11 10:09] ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR Visit our website and forum for extensive documentation and answers to
[2014-03-11 10:09] ##### ERROR The error message below tells you what is the problem.
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
[2014-03-11 10:09] ##### ERROR If the problem is an invalid argument, please check the online documentation guide
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
[2014-03-11 10:09] ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR Visit our website and forum for extensive documentation and answers to
[2014-03-11 10:09] ##### ERROR MESSAGE: SAM/BAM file /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups
[2014-03-11 10:09] ##### ERROR ------------------------------------------------------------------------------------------
[2014-03-11 10:09] ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR MESSAGE: SAM/BAM file /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam is malformed: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups
[2014-03-11 10:09] ##### ERROR ------------------------------------------------------------------------------------------
[2014-03-11 10:09] [samopen] no @SQ lines in the header.
[2014-03-11 10:09] [sam_read1] missing header? Abort!
[2014-03-11 10:09] [samopen] no @SQ lines in the header.
[2014-03-11 10:09] [sam_read1] missing header? Abort!
[2014-03-11 10:09] Uncaught exception occurred
Traceback (most recent call last):
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 23, in run
_do_run(cmd, checks, log_stdout)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 118, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/gatk-framework -Xms250m -Xmx5333m -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L chr1:1-1982140 -R /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa -I /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam --downsample_to_coverage 10000 --logging_level ERROR | /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/samtools view -S -b - > /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/bamprep/OvationFFPE/chr1/tx/tmpY5brtW/1_2014-03-11_OvationSureSelect_test.sorted-chr1_0_1982140-prep.bam
Some of these are quite short reads so I'm guessing the bwa aln / samse steps haven't added the read groups in?
Miika; Apologies, this is a bug in our bwa aln usage. I pushed a fix which correctly adds in the read groups and should let you finish cleanly. You'll have to remove the old alignment files after updating before the re-run:
rm -f align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.*
Let us know if you run into anything else at all. Thanks again for the report.
Thanks! I'll report back if I get any other problems downstream..
OK, next round of errors :)
[2014-03-12 04:55] Timing: alignment post-processing
[2014-03-12 04:55] multiprocessing: piped_bamprep
[2014-03-12 04:55] Piped post-alignment bamprep ('chr8_gl000196_random', 0, 38914) : OvationFFPE : ('chr8_gl000196_random', 0, 38914)
[2014-03-12 04:55] [samopen] SAM header is present: 93 sequences.
[2014-03-12 04:55] [sam_read1] reference 'ID:bwa PN:bwa VN:0.7.7-r441 CL:/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa samse -r @RG\tID:1\tPL:illumina\tPU:1_2014-03-11_OvationSureSelect_test\tSM:OvationFFPE /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test_1.sai /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFFPE.fq
[2014-03-12 04:55] SN:chr13 LN:115169878
[2014-03-12 04:55] @SQ SN:chr14 LN:1073495401' is recognized as '*'.
[2014-03-12 04:55] [main_samview] truncated file.
[2014-03-12 04:55] Uncaught exception occurred
Traceback (most recent call last):
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 23, in run
_do_run(cmd, checks, log_stdout)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 118, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/gatk-framework -Xms250m -Xmx5333m -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L chr8_gl000196_random:1-38914 -R /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa -I /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam --downsample_to_coverage 10000 --logging_level ERROR | /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/samtools view -S -b - > /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/bamprep/OvationFFPE/chr8_gl000196_random/tx/tmp6z9ES0/1_2014-03-11_OvationSureSelect_test.sorted-chr8_gl000196_random_0_38914-prep.bam
[samopen] SAM header is present: 93 sequences.
[sam_read1] reference 'ID:bwa PN:bwa VN:0.7.7-r441 CL:/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa samse -r @RG\tID:1\tPL:illumina\tPU:1_2014-03-11_OvationSureSelect_test\tSM:OvationFFPE /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test_1.sai /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFFPE.fq
SN:chr13 LN:115169878
@SQ SN:chr14 LN:1073495401' is recognized as '*'.
[main_samview] truncated file.
' returned non-zero exit status 1
Traceback (most recent call last):
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bcbio_nextgen.py", line 59, in <module>
main(**kwargs)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bcbio_nextgen.py", line 39, in main
run_main(**kwargs)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 40, in run_main
fc_dir, run_info_yaml)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel
for xs in pipeline.run(config, config_file, parallel, dirs, pipeline_items):
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 320, in run
samples = region.parallel_prep_region(samples, regions, run_parallel)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/pipeline/region.py", line 69, in parallel_prep_region
"piped_bamprep", None, file_key, ["config"])
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/distributed/split.py", line 76, in parallel_split_combine
split_output = parallel_fn(parallel_name, split_args)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
return run_multicore(fn, items, config, parallel=parallel)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 82, in run_multicore
for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items):
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 644, in __call__
self.dispatch(function, args, kwargs)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 391, in dispatch
job = ImmediateApply(func, args, kwargs)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 129, in __init__
self.results = func(*args, **kwargs)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 47, in wrapper
return apply(f, *args, **kwargs)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 41, in piped_bamprep
return bamprep.piped_bamprep(*args)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/variation/bamprep.py", line 230, in piped_bamprep
_piped_bamprep_region(data, region, out_file, tmp_dir)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/variation/bamprep.py", line 212, in _piped_bamprep_region
_piped_bamprep_region_fullpipe(data, region, prep_params, out_file, tmp_dir)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/variation/bamprep.py", line 187, in _piped_bamprep_region_fullpipe
region=region)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 23, in run
_do_run(cmd, checks, log_stdout)
File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 118, in _do_run
raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/gatk-framework -Xms250m -Xmx5333m -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L chr8_gl000196_random:1-38914 -R /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa -I /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam --downsample_to_coverage 10000 --logging_level ERROR | /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/samtools view -S -b - > /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/bamprep/OvationFFPE/chr8_gl000196_random/tx/tmp6z9ES0/1_2014-03-11_OvationSureSelect_test.sorted-chr8_gl000196_random_0_38914-prep.bam
[samopen] SAM header is present: 93 sequences.
[sam_read1] reference 'ID:bwa PN:bwa VN:0.7.7-r441 CL:/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa samse -r @RG\tID:1\tPL:illumina\tPU:1_2014-03-11_OvationSureSelect_test\tSM:OvationFFPE /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test_1.sai /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFFPE.fq
SN:chr13 LN:115169878
@SQ SN:chr14 LN:1073495401' is recognized as '*'.
[main_samview] truncated file.
' returned non-zero exit status 1
Apologies and thanks!
Miika; It appears to be complaining about the header in your input file. If you look at:
samtools view -H /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam
Does the header look okay, or is there something funny around the @RG
line? Sorry about all these problems, we don't get much small read data these days.
One practical suggestion if this is holding you up is to align the smaller reads with novoalign instead of bwa, which should handle them cleanly and uses the same pipeline for short and long reads.
Thanks Brad, Here's the header:
@HD VN:1.3 SO:coordinate
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chr1_gl000191_random LN:106433
@SQ SN:chr1_gl000192_random LN:547496
@SQ SN:chr4_ctg9_hap1 LN:590426
@SQ SN:chr4_gl000193_random LN:189789
@SQ SN:chr4_gl000194_random LN:191469
@SQ SN:chr6_apd_hap1 LN:4622290
@SQ SN:chr6_cox_hap2 LN:4795371
@SQ SN:chr6_dbb_hap3 LN:4610396
@SQ SN:chr6_mann_hap4 LN:4683263
@SQ SN:chr6_mcf_hap5 LN:4833398
@SQ SN:chr6_qbl_hap6 LN:4611984
@SQ SN:chr6_ssto_hap7 LN:4928567
@SQ SN:chr7_gl000195_random LN:182896
@SQ SN:chr8_gl000196_random LN:38914
@SQ SN:chr8_gl000197_random LN:37175
@SQ SN:chr9_gl000198_random LN:90085
@SQ SN:chr9_gl000199_random LN:169874
@SQ SN:chr9_gl000200_random LN:187035
@SQ SN:chr9_gl000201_random LN:36148
@SQ SN:chr11_gl000202_random LN:40103
@SQ SN:chr17_ctg5_hap1 LN:1680828
@SQ SN:chr17_gl000203_random LN:37498
@SQ SN:chr17_gl000204_random LN:81310
@SQ SN:chr17_gl000205_random LN:174588
@SQ SN:chr17_gl000206_random LN:41001
@SQ SN:chr18_gl000207_random LN:4262
@SQ SN:chr19_gl000208_random LN:92689
@SQ SN:chr19_gl000209_random LN:159169
@SQ SN:chr21_gl000210_random LN:27682
@SQ SN:chrUn_gl000211 LN:166566
@SQ SN:chrUn_gl000212 LN:186858
@SQ SN:chrUn_gl000213 LN:164239
@SQ SN:chrUn_gl000214 LN:137718
@SQ SN:chrUn_gl000215 LN:172545
@SQ SN:chrUn_gl000216 LN:172294
@SQ SN:chrUn_gl000217 LN:172149
@SQ SN:chrUn_gl000218 LN:161147
@SQ SN:chrUn_gl000219 LN:179198
@SQ SN:chrUn_gl000220 LN:161802
@SQ SN:chrUn_gl000221 LN:155397
@SQ SN:chrUn_gl000222 LN:186861
@SQ SN:chrUn_gl000223 LN:180455
@SQ SN:chrUn_gl000224 LN:179693
@SQ SN:chrUn_gl000225 LN:211173
@SQ SN:chrUn_gl000226 LN:15008
@SQ SN:chrUn_gl000227 LN:128374
@SQ SN:chrUn_gl000228 LN:129120
@SQ SN:chrUn_gl000229 LN:19913
@SQ SN:chrUn_gl000230 LN:43691
@SQ SN:chrUn_gl000231 LN:27386
@SQ SN:chrUn_gl000232 LN:40652
@SQ SN:chrUn_gl000233 LN:45941
@SQ SN:chrUn_gl000234 LN:40531
@SQ SN:chrUn_gl000235 LN:34474
@SQ SN:chrUn_gl000236 LN:41934
@SQ SN:chrUn_gl000237 LN:45867
@SQ SN:chrUn_gl000238 LN:39939
@SQ SN:chrUn_gl000239 LN:33824
@SQ SN:chrUn_gl000240 LN:41933
@SQ SN:chrUn_gl000241 LN:42152
@SQ SN:chrUn_gl000242 LN:43523
@SQ SN:chrUn_gl000243 LN:43341
@SQ SN:chrUn_gl000244 LN:39929
@SQ SN:chrUn_gl000245 LN:36651
@SQ SN:chrUn_gl000246 LN:38154
@SQ SN:chrUn_gl000247 LN:36422
@SQ SN:chrUn_gl000248 LN:39786
@SQ SN:chrUn_gl000249 LN:38502
@RG ID:1 PL:illumina PU:1_2014-03-11_OvationSureSelect_test SM:OvationFFPE
@PG ID:bwa PN:bwa VN:0.7.7-r441 CL:/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa samse -r @RG\tID:1\tPL:illumina\tPU:1_2014-03-11_OvationSureSelect_test\tSM:OvationFFPE /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test_1.sai /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFFPE.fq
I don't know if we have licence to use novoalign (if it needs one) but I could always check. How about forcing bwa mem for this data?
Miika; Thanks much, that looks all fine so now I'm really confused about the error. Is it reproducible on a re-run? Another debugging idea is to run this command:
/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/gatk-framework -Xms250m -Xmx5333m -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L chr8_gl000196_random:1-38914 -R /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa -I /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam --downsample_to_coverage 10000 --logging_level ERROR > test.sam
and look at the test.sam
output to be sure the header looks as expected.
For bwa, even though I love bwa mem I'd prefer to follow Heng Li's recommendations rather than hack to avoid a pipeline error. Hopefully we can sort out the underlying cause of this one.
Let me start a rerun.. In the meantime, the header for your command is here (there are no reads):
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chr1_gl000191_random LN:106433
@SQ SN:chr1_gl000192_random LN:547496
@SQ SN:chr4_ctg9_hap1 LN:590426
@SQ SN:chr4_gl000193_random LN:189789
@SQ SN:chr4_gl000194_random LN:191469
@SQ SN:chr6_apd_hap1 LN:4622290
@SQ SN:chr6_cox_hap2 LN:4795371
@SQ SN:chr6_dbb_hap3 LN:4610396
@SQ SN:chr6_mann_hap4 LN:4683263
@SQ SN:chr6_mcf_hap5 LN:4833398
@SQ SN:chr6_qbl_hap6 LN:4611984
@SQ SN:chr6_ssto_hap7 LN:4928567
@SQ SN:chr7_gl000195_random LN:182896
@SQ SN:chr8_gl000196_random LN:38914
@SQ SN:chr8_gl000197_random LN:37175
@SQ SN:chr9_gl000198_random LN:90085
@SQ SN:chr9_gl000199_random LN:169874
@SQ SN:chr9_gl000200_random LN:187035
@SQ SN:chr9_gl000201_random LN:36148
@SQ SN:chr11_gl000202_random LN:40103
@SQ SN:chr17_ctg5_hap1 LN:1680828
@SQ SN:chr17_gl000203_random LN:37498
@SQ SN:chr17_gl000204_random LN:81310
@SQ SN:chr17_gl000205_random LN:174588
@SQ SN:chr17_gl000206_random LN:41001
@SQ SN:chr18_gl000207_random LN:4262
@SQ SN:chr19_gl000208_random LN:92689
@SQ SN:chr19_gl000209_random LN:159169
@SQ SN:chr21_gl000210_random LN:27682
@SQ SN:chrUn_gl000211 LN:166566
@SQ SN:chrUn_gl000212 LN:186858
@SQ SN:chrUn_gl000213 LN:164239
@SQ SN:chrUn_gl000214 LN:137718
@SQ SN:chrUn_gl000215 LN:172545
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chr1_gl000191_random LN:106433
@SQ SN:chr1_gl000192_random LN:547496
@SQ SN:chr4_ctg9_hap1 LN:590426
@SQ SN:chr4_gl000193_random LN:189789
@SQ SN:chr4_gl000194_random LN:191469
@SQ SN:chr6_apd_hap1 LN:4622290
@SQ SN:chr6_cox_hap2 LN:4795371
@SQ SN:chr6_dbb_hap3 LN:4610396
@SQ SN:chr6_mann_hap4 LN:4683263
@SQ SN:chr6_mcf_hap5 LN:4833398
@SQ SN:chr6_qbl_hap6 LN:4611984
@SQ SN:chr6_ssto_hap7 LN:4928567
@SQ SN:chr7_gl000195_random LN:182896
@SQ SN:chr8_gl000196_random LN:38914
[klrl262@rask:/gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles]$ /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/gatk-framework -Xms250m -Xmx5333m -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L chr8_gl000196_random:1-38914 -R /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa -I /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam --downsample_to_coverage 10000 --logging_level ERROR > test.sam
Miika;
That header doesn't look right at all. It's got two @HD
tags and is missing the @RG
tag. That's super strange since the original BAM file header looks fine. Is it possible the /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam.bai
index file is messed up? That might make PrintReads choke when trying to grab specific regions, while the streaming commands work okay. If you slice with samtools, does it looks bad as well?
samtools view /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam chr8_gl000196_random:1-38914
Sorry again to have a non-reproducer on our side. Hope this helps some.
Hi Brad,
No reads in chr8_gl000196_random
but in chr7_gl000195_random
the slicing works fine. Would you like me to email you the fq file for testing purposes?
Miika;
Sending the file would be great, thank you. Did the headers look okay with the samtools slicing commands (sorry, should have started with samtools view -h
)? I'm confused as to why it's splitting out strange headers even if there are no reads present. Thanks for all the patience debugging.
Here's the header from the bam file (looks OK'ish):
@HD VN:1.3 SO:coordinate
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@SQ SN:chr1_gl000191_random LN:106433
@SQ SN:chr1_gl000192_random LN:547496
@SQ SN:chr4_ctg9_hap1 LN:590426
@SQ SN:chr4_gl000193_random LN:189789
@SQ SN:chr4_gl000194_random LN:191469
@SQ SN:chr6_apd_hap1 LN:4622290
@SQ SN:chr6_cox_hap2 LN:4795371
@SQ SN:chr6_dbb_hap3 LN:4610396
@SQ SN:chr6_mann_hap4 LN:4683263
@SQ SN:chr6_mcf_hap5 LN:4833398
@SQ SN:chr6_qbl_hap6 LN:4611984
@SQ SN:chr6_ssto_hap7 LN:4928567
@SQ SN:chr7_gl000195_random LN:182896
@SQ SN:chr8_gl000196_random LN:38914
@SQ SN:chr8_gl000197_random LN:37175
@SQ SN:chr9_gl000198_random LN:90085
@SQ SN:chr9_gl000199_random LN:169874
@SQ SN:chr9_gl000200_random LN:187035
@SQ SN:chr9_gl000201_random LN:36148
@SQ SN:chr11_gl000202_random LN:40103
@SQ SN:chr17_ctg5_hap1 LN:1680828
@SQ SN:chr17_gl000203_random LN:37498
@SQ SN:chr17_gl000204_random LN:81310
@SQ SN:chr17_gl000205_random LN:174588
@SQ SN:chr17_gl000206_random LN:41001
@SQ SN:chr18_gl000207_random LN:4262
@SQ SN:chr19_gl000208_random LN:92689
@SQ SN:chr19_gl000209_random LN:159169
@SQ SN:chr21_gl000210_random LN:27682
@SQ SN:chrUn_gl000211 LN:166566
@SQ SN:chrUn_gl000212 LN:186858
@SQ SN:chrUn_gl000213 LN:164239
@SQ SN:chrUn_gl000214 LN:137718
@SQ SN:chrUn_gl000215 LN:172545
@SQ SN:chrUn_gl000216 LN:172294
@SQ SN:chrUn_gl000217 LN:172149
@SQ SN:chrUn_gl000218 LN:161147
@SQ SN:chrUn_gl000219 LN:179198
@SQ SN:chrUn_gl000220 LN:161802
@SQ SN:chrUn_gl000221 LN:155397
@SQ SN:chrUn_gl000222 LN:186861
@SQ SN:chrUn_gl000223 LN:180455
@SQ SN:chrUn_gl000224 LN:179693
@SQ SN:chrUn_gl000225 LN:211173
@SQ SN:chrUn_gl000226 LN:15008
@SQ SN:chrUn_gl000227 LN:128374
@SQ SN:chrUn_gl000228 LN:129120
@SQ SN:chrUn_gl000229 LN:19913
@SQ SN:chrUn_gl000230 LN:43691
@SQ SN:chrUn_gl000231 LN:27386
@SQ SN:chrUn_gl000232 LN:40652
@SQ SN:chrUn_gl000233 LN:45941
@SQ SN:chrUn_gl000234 LN:40531
@SQ SN:chrUn_gl000235 LN:34474
@SQ SN:chrUn_gl000236 LN:41934
@SQ SN:chrUn_gl000237 LN:45867
@SQ SN:chrUn_gl000238 LN:39939
@SQ SN:chrUn_gl000239 LN:33824
@SQ SN:chrUn_gl000240 LN:41933
@SQ SN:chrUn_gl000241 LN:42152
@SQ SN:chrUn_gl000242 LN:43523
@SQ SN:chrUn_gl000243 LN:43341
@SQ SN:chrUn_gl000244 LN:39929
@SQ SN:chrUn_gl000245 LN:36651
@SQ SN:chrUn_gl000246 LN:38154
@SQ SN:chrUn_gl000247 LN:36422
@SQ SN:chrUn_gl000248 LN:39786
@SQ SN:chrUn_gl000249 LN:38502
@RG ID:1 PL:illumina PU:1_2014-03-11_OvationSureSelect_test SM:OvationFFPE
@PG ID:bwa PN:bwa VN:0.7.7-r441 CL:/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa samse -r @RG\tID:1\tPL:illumina\tPU:1_2014-03-11_OvationSureSelect_test\tSM:OvationFFPE /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test_1.sai /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFFPE.fq
Miika;
Thanks for sending the test files, that was a big help. It looks like the single/short read was a red herring. The actual issue is that there appears to be a bug in samtools when calling samtools view
to convert from SAM to BAM if the input SAM has only a header and no reads.
I avoided the issue by using sambamba view
, which handles this case correctly. Thanks again for the help debugging it and getting this sorted out.
Ah.. Correlation != causality
Thanks for the fix!
I'm getting this error trying to call variants in a single fq file:
Is it me doing something wrong?-)