bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 353 forks source link

0.7.7 error with singleton fq file #346

Closed mjafin closed 10 years ago

mjafin commented 10 years ago

I'm getting this error trying to call variants in a single fq file:

[2014-03-11 07:34] rask: Using input YAML configuration: /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/config/OvationSureSelect_test_singles.yaml
Traceback (most recent call last):
  File "/opt/az/local/bcbio-nextgen/stable/0.7.7/tooldir/bin/bcbio_nextgen.py", line 59, in <module>
    main(**kwargs)
[2014-03-11 07:34] rask: Checking sample YAML configuration: /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/config/OvationSureSelect_test_singles.yaml
  File "/opt/az/local/bcbio-nextgen/stable/0.7.7/tooldir/bin/bcbio_nextgen.py", line 39, in main
    run_main(**kwargs)
  File "/opt/az/local/bcbio-nextgen/stable/0.7.7/installdir/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 46, in run_main
    fc_dir, run_info_yaml)
  File "/opt/az/local/bcbio-nextgen/stable/0.7.7/installdir/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 80, in _run_toplevel
    samples = run_info.organize(dirs, config, run_info_yaml)
  File "/opt/az/local/bcbio-nextgen/stable/0.7.7/installdir/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 27, in organize
    run_details = _run_info_from_yaml(dirs["flowcell"], run_info_yaml, config)
  File "/opt/az/local/bcbio-nextgen/stable/0.7.7/installdir/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 280, in _run_info_from_yaml
    _check_sample_config(run_details, run_info_yaml)
  File "/opt/az/local/bcbio-nextgen/stable/0.7.7/installdir/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 190, in _check_sample_config
    [_check_algorithm_keys(x) for x in items]
  File "/opt/az/local/bcbio-nextgen/stable/0.7.7/installdir/anaconda/lib/python2.7/site-packages/bcbio/pipeline/run_info.py", line 158, in _check_algorithm_keys
    % (problem_keys, url))
ValueError: Unexpected configuration keyword in 'algorithm' section: ['max_errors']

Is it me doing something wrong?-)

chapmanb commented 10 years ago

Miika; Apologies, we'd torn out max_errors but it looks like there were still some references in the docs and code that could cause confusion.

In your case, my guess is that you have an older bcbio_system.yaml file with an algorithm section. We also removed that, because it caused more confusion than helped. But if you installed this from an older version it might still be there and have a max_errors parameter in it.

So the short answer is: go to your bcbio_system.yaml file and delete the algorithm section, which should fix it.

mjafin commented 10 years ago

Hi Brad, That's odd, we don't actually have algorithm in bcbio_system.yaml and I'm only seeing this error on SE reads, not with PE data. But I'll look into patching our version with your fix..

Cheers, Miika

chapmanb commented 10 years ago

Miika; Is max_errors in your input sample YAML file? The error message just tells you that you've got this parameter somewhere in your input configuration and it's no longer supported. The fix doesn't address this directly but is mainly cleaning up the half-finished cleanup of removing this parameter.

mjafin commented 10 years ago

As far as I can tell, max_errors is in neither the global yaml or the sample yaml.. I'll look deeper into this

mjafin commented 10 years ago

The fixes helped me get past the alignment, but then I get the following error:

[2014-03-11 10:06] [bwa_aln_core] print alignments... 0.15 sec
[2014-03-11 10:06] [bwa_aln_core] 1636447 sequences have been processed.
[2014-03-11 10:06] [main] Version: 0.7.7-r441
[2014-03-11 10:06] Convert SAM to BAM (8 cores): /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sam to /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.bam
[2014-03-11 10:06] [samopen] SAM header is present: 93 sequences.
[2014-03-11 10:06] Sort BAM file (multi core, coordinate): 1_2014-03-11_OvationSureSelect_test.bam to 1_2014-03-11_OvationSureSelect_test.sorted.bam
[2014-03-11 10:06] Index BAM file: 1_2014-03-11_OvationSureSelect_test.sorted.bam
[2014-03-11 10:06] Aligning lane 2_2014-03-11_OvationSureSelect_test with bwa aligner
[2014-03-11 10:06] bwa mem alignment from fastq: OvationFresh
[2014-03-11 10:06] [bam_header_read] EOF marker is absent. The input is probably truncated.
[2014-03-11 10:07] [samopen] SAM header is present: 93 sequences.
[2014-03-11 10:08] [main] Version: 0.7.7-r441
[2014-03-11 10:08] [main] CMD: /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa mem -M -t 8 -R @RG\tID:2\tPL:illumina\tPU:2_2014-03-11_OvationSureSelect_test\tSM:OvationFresh -v 1 /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFresh.fq
[2014-03-11 10:08] [main] Real time: 71.286 sec; CPU: 224.862 sec
[2014-03-11 10:08] Index BAM file: 2_2014-03-11_OvationSureSelect_test-sort.bam
[2014-03-11 10:08] Timing: callable regions
[2014-03-11 10:08] multiprocessing: postprocess_alignment
[2014-03-11 10:08] Prepare cleaned BED file : OvationFFPE
[2014-03-11 10:08] bgzip cancerPanel_target.bed
[2014-03-11 10:08] tabix index cancerPanel_target.bed.gz
[2014-03-11 10:08] Resource requests: ; memory: 1.0; cores: 1
[2014-03-11 10:08] Configuring 1 jobs to run, using 1 cores each with 1.2g of memory reserved for each job
[2014-03-11 10:08] multiprocessing: calc_callable_loci
[2014-03-11 10:08] multiprocessing: combine_bed
[2014-03-11 10:08] Resource requests: ; memory: 1.0; cores: 1
[2014-03-11 10:08] Configuring 1 jobs to run, using 1 cores each with 1.2g of memory reserved for each job
[2014-03-11 10:08] multiprocessing: calc_callable_loci
[2014-03-11 10:09] multiprocessing: combine_bed
[2014-03-11 10:09] multiprocessing: combine_sample_regions
[2014-03-11 10:09] Identified 338 parallel analysis blocks
Block sizes:
  min: 4262
  5%: 39907.55
  25%: 2027585.25
  median: 5025856.5
  75%: 10965578.0
  95%: 33460817.55
  99%: 55035159.86
  max: 67327613
Between block sizes:
  min: 46
  5%: 173.4
  25%: 1078.0
  median: 2785.0
  75%: 11361.0
  95%: 114915.6
  99%: 11190177.96
  max: 29031551
[2014-03-11 10:09] Timing: coverage
[2014-03-11 10:09] Resource requests: freebayes, gatk, gatk-haplotype, picard; memory: 16.0, 16.0, 16.0, 16.0; cores: 8, 1, 1, 1
[2014-03-11 10:09] Configuring 2 jobs to run, using 1 cores each with 16.2g of memory reserved for each job
[2014-03-11 10:09] Timing: alignment post-processing
[2014-03-11 10:09] multiprocessing: piped_bamprep
[2014-03-11 10:09] Piped post-alignment bamprep ('chrM', 0, 16571) : OvationFFPE : ('chrM', 0, 16571)
[2014-03-11 10:09] Piped post-alignment bamprep ('chr1', 0, 1982140) : OvationFFPE : ('chr1', 0, 1982140)
[2014-03-11 10:09] ##### ERROR ------------------------------------------------------------------------------------------
[2014-03-11 10:09] ##### ERROR A USER ERROR has occurred (version 3.0-7-gac6f69f):
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
[2014-03-11 10:09] ##### ERROR The error message below tells you what is the problem.
[2014-03-11 10:09] ##### ERROR ------------------------------------------------------------------------------------------
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR If the problem is an invalid argument, please check the online documentation guide
[2014-03-11 10:09] ##### ERROR A USER ERROR has occurred (version 3.0-7-gac6f69f):
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
[2014-03-11 10:09] ##### ERROR This means that one or more arguments or inputs in your command are incorrect.
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR Visit our website and forum for extensive documentation and answers to
[2014-03-11 10:09] ##### ERROR The error message below tells you what is the problem.
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
[2014-03-11 10:09] ##### ERROR If the problem is an invalid argument, please check the online documentation guide
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
[2014-03-11 10:09] ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR Visit our website and forum for extensive documentation and answers to
[2014-03-11 10:09] ##### ERROR MESSAGE: SAM/BAM file /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam is malformed: SAM file doesn't have any read groups defined in the header.  The GATK no longer supports SAM files without read groups
[2014-03-11 10:09] ##### ERROR ------------------------------------------------------------------------------------------
[2014-03-11 10:09] ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
[2014-03-11 10:09] ##### ERROR
[2014-03-11 10:09] ##### ERROR MESSAGE: SAM/BAM file /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam is malformed: SAM file doesn't have any read groups defined in the header.  The GATK no longer supports SAM files without read groups
[2014-03-11 10:09] ##### ERROR ------------------------------------------------------------------------------------------
[2014-03-11 10:09] [samopen] no @SQ lines in the header.
[2014-03-11 10:09] [sam_read1] missing header? Abort!
[2014-03-11 10:09] [samopen] no @SQ lines in the header.
[2014-03-11 10:09] [sam_read1] missing header? Abort!
[2014-03-11 10:09] Uncaught exception occurred
Traceback (most recent call last):
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 23, in run
    _do_run(cmd, checks, log_stdout)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 118, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/gatk-framework -Xms250m -Xmx5333m -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L chr1:1-1982140 -R /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa -I /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam --downsample_to_coverage 10000 --logging_level ERROR | /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/samtools view -S -b -    > /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/bamprep/OvationFFPE/chr1/tx/tmpY5brtW/1_2014-03-11_OvationSureSelect_test.sorted-chr1_0_1982140-prep.bam

Some of these are quite short reads so I'm guessing the bwa aln / samse steps haven't added the read groups in?

chapmanb commented 10 years ago

Miika; Apologies, this is a bug in our bwa aln usage. I pushed a fix which correctly adds in the read groups and should let you finish cleanly. You'll have to remove the old alignment files after updating before the re-run:

rm -f align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.*

Let us know if you run into anything else at all. Thanks again for the report.

mjafin commented 10 years ago

Thanks! I'll report back if I get any other problems downstream..

mjafin commented 10 years ago

OK, next round of errors :)

[2014-03-12 04:55] Timing: alignment post-processing
[2014-03-12 04:55] multiprocessing: piped_bamprep
[2014-03-12 04:55] Piped post-alignment bamprep ('chr8_gl000196_random', 0, 38914) : OvationFFPE : ('chr8_gl000196_random', 0, 38914)
[2014-03-12 04:55] [samopen] SAM header is present: 93 sequences.
[2014-03-12 04:55] [sam_read1] reference 'ID:bwa        PN:bwa  VN:0.7.7-r441   CL:/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa samse -r @RG\tID:1\tPL:illumina\tPU:1_2014-03-11_OvationSureSelect_test\tSM:OvationFFPE /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test_1.sai /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFFPE.fq
[2014-03-12 04:55] SN:chr13     LN:115169878
[2014-03-12 04:55] @SQ  SN:chr14        LN:1073495401' is recognized as '*'.
[2014-03-12 04:55] [main_samview] truncated file.
[2014-03-12 04:55] Uncaught exception occurred
Traceback (most recent call last):
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 23, in run
    _do_run(cmd, checks, log_stdout)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 118, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/gatk-framework -Xms250m -Xmx5333m -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L chr8_gl000196_random:1-38914 -R /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa -I /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam --downsample_to_coverage 10000 --logging_level ERROR | /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/samtools view -S -b -    > /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/bamprep/OvationFFPE/chr8_gl000196_random/tx/tmp6z9ES0/1_2014-03-11_OvationSureSelect_test.sorted-chr8_gl000196_random_0_38914-prep.bam
[samopen] SAM header is present: 93 sequences.
[sam_read1] reference 'ID:bwa   PN:bwa  VN:0.7.7-r441   CL:/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa samse -r @RG\tID:1\tPL:illumina\tPU:1_2014-03-11_OvationSureSelect_test\tSM:OvationFFPE /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test_1.sai /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFFPE.fq
SN:chr13        LN:115169878
@SQ     SN:chr14        LN:1073495401' is recognized as '*'.
[main_samview] truncated file.
' returned non-zero exit status 1
Traceback (most recent call last):
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bcbio_nextgen.py", line 59, in <module>
    main(**kwargs)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bcbio_nextgen.py", line 39, in main
    run_main(**kwargs)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 40, in run_main
    fc_dir, run_info_yaml)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 87, in _run_toplevel
    for xs in pipeline.run(config, config_file, parallel, dirs, pipeline_items):
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 320, in run
    samples = region.parallel_prep_region(samples, regions, run_parallel)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/pipeline/region.py", line 69, in parallel_prep_region
    "piped_bamprep", None, file_key, ["config"])
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/distributed/split.py", line 76, in parallel_split_combine
    split_output = parallel_fn(parallel_name, split_args)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel
    return run_multicore(fn, items, config, parallel=parallel)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/distributed/multi.py", line 82, in run_multicore
    for data in joblib.Parallel(parallel["num_jobs"])(joblib.delayed(fn)(x) for x in items):
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 644, in __call__
    self.dispatch(function, args, kwargs)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 391, in dispatch
    job = ImmediateApply(func, args, kwargs)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/joblib/parallel.py", line 129, in __init__
    self.results = func(*args, **kwargs)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/utils.py", line 47, in wrapper
    return apply(f, *args, **kwargs)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/distributed/multitasks.py", line 41, in piped_bamprep
    return bamprep.piped_bamprep(*args)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/variation/bamprep.py", line 230, in piped_bamprep
    _piped_bamprep_region(data, region, out_file, tmp_dir)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/variation/bamprep.py", line 212, in _piped_bamprep_region
    _piped_bamprep_region_fullpipe(data, region, prep_params, out_file, tmp_dir)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/variation/bamprep.py", line 187, in _piped_bamprep_region_fullpipe
    region=region)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 23, in run
    _do_run(cmd, checks, log_stdout)
  File "/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 118, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
subprocess.CalledProcessError: Command 'set -o pipefail; /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/gatk-framework -Xms250m -Xmx5333m -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L chr8_gl000196_random:1-38914 -R /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa -I /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam --downsample_to_coverage 10000 --logging_level ERROR | /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/samtools view -S -b -    > /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/bamprep/OvationFFPE/chr8_gl000196_random/tx/tmp6z9ES0/1_2014-03-11_OvationSureSelect_test.sorted-chr8_gl000196_random_0_38914-prep.bam
[samopen] SAM header is present: 93 sequences.
[sam_read1] reference 'ID:bwa   PN:bwa  VN:0.7.7-r441   CL:/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa samse -r @RG\tID:1\tPL:illumina\tPU:1_2014-03-11_OvationSureSelect_test\tSM:OvationFFPE /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test_1.sai /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFFPE.fq
SN:chr13        LN:115169878
@SQ     SN:chr14        LN:1073495401' is recognized as '*'.
[main_samview] truncated file.
' returned non-zero exit status 1

Apologies and thanks!

chapmanb commented 10 years ago

Miika; It appears to be complaining about the header in your input file. If you look at:

samtools view -H /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam

Does the header look okay, or is there something funny around the @RG line? Sorry about all these problems, we don't get much small read data these days.

One practical suggestion if this is holding you up is to align the smaller reads with novoalign instead of bwa, which should handle them cleanly and uses the same pipeline for short and long reads.

mjafin commented 10 years ago

Thanks Brad, Here's the header:

@HD     VN:1.3  SO:coordinate
@SQ     SN:chrM LN:16571
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067
@SQ     SN:chr7 LN:159138663
@SQ     SN:chr8 LN:146364022
@SQ     SN:chr9 LN:141213431
@SQ     SN:chr10        LN:135534747
@SQ     SN:chr11        LN:135006516
@SQ     SN:chr12        LN:133851895
@SQ     SN:chr13        LN:115169878
@SQ     SN:chr14        LN:107349540
@SQ     SN:chr15        LN:102531392
@SQ     SN:chr16        LN:90354753
@SQ     SN:chr17        LN:81195210
@SQ     SN:chr18        LN:78077248
@SQ     SN:chr19        LN:59128983
@SQ     SN:chr20        LN:63025520
@SQ     SN:chr21        LN:48129895
@SQ     SN:chr22        LN:51304566
@SQ     SN:chrX LN:155270560
@SQ     SN:chrY LN:59373566
@SQ     SN:chr1_gl000191_random LN:106433
@SQ     SN:chr1_gl000192_random LN:547496
@SQ     SN:chr4_ctg9_hap1       LN:590426
@SQ     SN:chr4_gl000193_random LN:189789
@SQ     SN:chr4_gl000194_random LN:191469
@SQ     SN:chr6_apd_hap1        LN:4622290
@SQ     SN:chr6_cox_hap2        LN:4795371
@SQ     SN:chr6_dbb_hap3        LN:4610396
@SQ     SN:chr6_mann_hap4       LN:4683263
@SQ     SN:chr6_mcf_hap5        LN:4833398
@SQ     SN:chr6_qbl_hap6        LN:4611984
@SQ     SN:chr6_ssto_hap7       LN:4928567
@SQ     SN:chr7_gl000195_random LN:182896
@SQ     SN:chr8_gl000196_random LN:38914
@SQ     SN:chr8_gl000197_random LN:37175
@SQ     SN:chr9_gl000198_random LN:90085
@SQ     SN:chr9_gl000199_random LN:169874
@SQ     SN:chr9_gl000200_random LN:187035
@SQ     SN:chr9_gl000201_random LN:36148
@SQ     SN:chr11_gl000202_random        LN:40103
@SQ     SN:chr17_ctg5_hap1      LN:1680828
@SQ     SN:chr17_gl000203_random        LN:37498
@SQ     SN:chr17_gl000204_random        LN:81310
@SQ     SN:chr17_gl000205_random        LN:174588
@SQ     SN:chr17_gl000206_random        LN:41001
@SQ     SN:chr18_gl000207_random        LN:4262
@SQ     SN:chr19_gl000208_random        LN:92689
@SQ     SN:chr19_gl000209_random        LN:159169
@SQ     SN:chr21_gl000210_random        LN:27682
@SQ     SN:chrUn_gl000211       LN:166566
@SQ     SN:chrUn_gl000212       LN:186858
@SQ     SN:chrUn_gl000213       LN:164239
@SQ     SN:chrUn_gl000214       LN:137718
@SQ     SN:chrUn_gl000215       LN:172545
@SQ     SN:chrUn_gl000216       LN:172294
@SQ     SN:chrUn_gl000217       LN:172149
@SQ     SN:chrUn_gl000218       LN:161147
@SQ     SN:chrUn_gl000219       LN:179198
@SQ     SN:chrUn_gl000220       LN:161802
@SQ     SN:chrUn_gl000221       LN:155397
@SQ     SN:chrUn_gl000222       LN:186861
@SQ     SN:chrUn_gl000223       LN:180455
@SQ     SN:chrUn_gl000224       LN:179693
@SQ     SN:chrUn_gl000225       LN:211173
@SQ     SN:chrUn_gl000226       LN:15008
@SQ     SN:chrUn_gl000227       LN:128374
@SQ     SN:chrUn_gl000228       LN:129120
@SQ     SN:chrUn_gl000229       LN:19913
@SQ     SN:chrUn_gl000230       LN:43691
@SQ     SN:chrUn_gl000231       LN:27386
@SQ     SN:chrUn_gl000232       LN:40652
@SQ     SN:chrUn_gl000233       LN:45941
@SQ     SN:chrUn_gl000234       LN:40531
@SQ     SN:chrUn_gl000235       LN:34474
@SQ     SN:chrUn_gl000236       LN:41934
@SQ     SN:chrUn_gl000237       LN:45867
@SQ     SN:chrUn_gl000238       LN:39939
@SQ     SN:chrUn_gl000239       LN:33824
@SQ     SN:chrUn_gl000240       LN:41933
@SQ     SN:chrUn_gl000241       LN:42152
@SQ     SN:chrUn_gl000242       LN:43523
@SQ     SN:chrUn_gl000243       LN:43341
@SQ     SN:chrUn_gl000244       LN:39929
@SQ     SN:chrUn_gl000245       LN:36651
@SQ     SN:chrUn_gl000246       LN:38154
@SQ     SN:chrUn_gl000247       LN:36422
@SQ     SN:chrUn_gl000248       LN:39786
@SQ     SN:chrUn_gl000249       LN:38502
@RG     ID:1    PL:illumina     PU:1_2014-03-11_OvationSureSelect_test  SM:OvationFFPE
@PG     ID:bwa  PN:bwa  VN:0.7.7-r441   CL:/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa samse -r @RG\tID:1\tPL:illumina\tPU:1_2014-03-11_OvationSureSelect_test\tSM:OvationFFPE /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test_1.sai /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFFPE.fq

I don't know if we have licence to use novoalign (if it needs one) but I could always check. How about forcing bwa mem for this data?

chapmanb commented 10 years ago

Miika; Thanks much, that looks all fine so now I'm really confused about the error. Is it reproducible on a re-run? Another debugging idea is to run this command:

/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/gatk-framework -Xms250m -Xmx5333m -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L chr8_gl000196_random:1-38914 -R /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa -I /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam --downsample_to_coverage 10000 --logging_level ERROR > test.sam

and look at the test.sam output to be sure the header looks as expected.

For bwa, even though I love bwa mem I'd prefer to follow Heng Li's recommendations rather than hack to avoid a pipeline error. Hopefully we can sort out the underlying cause of this one.

mjafin commented 10 years ago

Let me start a rerun.. In the meantime, the header for your command is here (there are no reads):

@HD     VN:1.4  GO:none SO:coordinate
@SQ     SN:chrM LN:16571
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067
@SQ     SN:chr7 LN:159138663
@SQ     SN:chr8 LN:146364022
@SQ     SN:chr9 LN:141213431
@SQ     SN:chr10        LN:135534747
@SQ     SN:chr11        LN:135006516
@SQ     SN:chr12        LN:133851895
@SQ     SN:chr13        LN:115169878
@SQ     SN:chr14        LN:107349540
@SQ     SN:chr15        LN:102531392
@SQ     SN:chr16        LN:90354753
@SQ     SN:chr17        LN:81195210
@SQ     SN:chr18        LN:78077248
@SQ     SN:chr19        LN:59128983
@SQ     SN:chr20        LN:63025520
@SQ     SN:chr21        LN:48129895
@SQ     SN:chr22        LN:51304566
@SQ     SN:chrX LN:155270560
@SQ     SN:chrY LN:59373566
@SQ     SN:chr1_gl000191_random LN:106433
@SQ     SN:chr1_gl000192_random LN:547496
@SQ     SN:chr4_ctg9_hap1       LN:590426
@SQ     SN:chr4_gl000193_random LN:189789
@SQ     SN:chr4_gl000194_random LN:191469
@SQ     SN:chr6_apd_hap1        LN:4622290
@SQ     SN:chr6_cox_hap2        LN:4795371
@SQ     SN:chr6_dbb_hap3        LN:4610396
@SQ     SN:chr6_mann_hap4       LN:4683263
@SQ     SN:chr6_mcf_hap5        LN:4833398
@SQ     SN:chr6_qbl_hap6        LN:4611984
@SQ     SN:chr6_ssto_hap7       LN:4928567
@SQ     SN:chr7_gl000195_random LN:182896
@SQ     SN:chr8_gl000196_random LN:38914
@SQ     SN:chr8_gl000197_random LN:37175
@SQ     SN:chr9_gl000198_random LN:90085
@SQ     SN:chr9_gl000199_random LN:169874
@SQ     SN:chr9_gl000200_random LN:187035
@SQ     SN:chr9_gl000201_random LN:36148
@SQ     SN:chr11_gl000202_random        LN:40103
@SQ     SN:chr17_ctg5_hap1      LN:1680828
@SQ     SN:chr17_gl000203_random        LN:37498
@SQ     SN:chr17_gl000204_random        LN:81310
@SQ     SN:chr17_gl000205_random        LN:174588
@SQ     SN:chr17_gl000206_random        LN:41001
@SQ     SN:chr18_gl000207_random        LN:4262
@SQ     SN:chr19_gl000208_random        LN:92689
@SQ     SN:chr19_gl000209_random        LN:159169
@SQ     SN:chr21_gl000210_random        LN:27682
@SQ     SN:chrUn_gl000211       LN:166566
@SQ     SN:chrUn_gl000212       LN:186858
@SQ     SN:chrUn_gl000213       LN:164239
@SQ     SN:chrUn_gl000214       LN:137718
@SQ     SN:chrUn_gl000215       LN:172545
@HD     VN:1.4  GO:none SO:coordinate
@SQ     SN:chrM LN:16571
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067
@SQ     SN:chr7 LN:159138663
@SQ     SN:chr8 LN:146364022
@SQ     SN:chr9 LN:141213431
@SQ     SN:chr10        LN:135534747
@SQ     SN:chr11        LN:135006516
@SQ     SN:chr12        LN:133851895
@SQ     SN:chr13        LN:115169878
@SQ     SN:chr14        LN:107349540
@SQ     SN:chr15        LN:102531392
@SQ     SN:chr16        LN:90354753
@SQ     SN:chr17        LN:81195210
@SQ     SN:chr18        LN:78077248
@SQ     SN:chr19        LN:59128983
@SQ     SN:chr20        LN:63025520
@SQ     SN:chr21        LN:48129895
@SQ     SN:chr22        LN:51304566
@SQ     SN:chrX LN:155270560
@SQ     SN:chrY LN:59373566
@SQ     SN:chr1_gl000191_random LN:106433
@SQ     SN:chr1_gl000192_random LN:547496
@SQ     SN:chr4_ctg9_hap1       LN:590426
@SQ     SN:chr4_gl000193_random LN:189789
@SQ     SN:chr4_gl000194_random LN:191469
@SQ     SN:chr6_apd_hap1        LN:4622290
@SQ     SN:chr6_cox_hap2        LN:4795371
@SQ     SN:chr6_dbb_hap3        LN:4610396
@SQ     SN:chr6_mann_hap4       LN:4683263
@SQ     SN:chr6_mcf_hap5        LN:4833398
@SQ     SN:chr6_qbl_hap6        LN:4611984
@SQ     SN:chr6_ssto_hap7       LN:4928567
@SQ     SN:chr7_gl000195_random LN:182896
@SQ     SN:chr8_gl000196_random LN:38914
[klrl262@rask:/gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles]$ /group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/gatk-framework -Xms250m -Xmx5333m -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L chr8_gl000196_random:1-38914 -R /ngs/reference_data/genomes/Hsapiens/hg19/seq/hg19.fa -I /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam --downsample_to_coverage 10000 --logging_level ERROR > test.sam
chapmanb commented 10 years ago

Miika; That header doesn't look right at all. It's got two @HD tags and is missing the @RG tag. That's super strange since the original BAM file header looks fine. Is it possible the /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam.bai index file is messed up? That might make PrintReads choke when trying to grab specific regions, while the streaming commands work okay. If you slice with samtools, does it looks bad as well?

samtools view /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test.sorted.bam chr8_gl000196_random:1-38914

Sorry again to have a non-reproducer on our side. Hope this helps some.

mjafin commented 10 years ago

Hi Brad, No reads in chr8_gl000196_random but in chr7_gl000195_random the slicing works fine. Would you like me to email you the fq file for testing purposes?

chapmanb commented 10 years ago

Miika; Sending the file would be great, thank you. Did the headers look okay with the samtools slicing commands (sorry, should have started with samtools view -h)? I'm confused as to why it's splitting out strange headers even if there are no reads present. Thanks for all the patience debugging.

mjafin commented 10 years ago

Here's the header from the bam file (looks OK'ish):

@HD     VN:1.3  SO:coordinate
@SQ     SN:chrM LN:16571
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr3 LN:198022430
@SQ     SN:chr4 LN:191154276
@SQ     SN:chr5 LN:180915260
@SQ     SN:chr6 LN:171115067
@SQ     SN:chr7 LN:159138663
@SQ     SN:chr8 LN:146364022
@SQ     SN:chr9 LN:141213431
@SQ     SN:chr10        LN:135534747
@SQ     SN:chr11        LN:135006516
@SQ     SN:chr12        LN:133851895
@SQ     SN:chr13        LN:115169878
@SQ     SN:chr14        LN:107349540
@SQ     SN:chr15        LN:102531392
@SQ     SN:chr16        LN:90354753
@SQ     SN:chr17        LN:81195210
@SQ     SN:chr18        LN:78077248
@SQ     SN:chr19        LN:59128983
@SQ     SN:chr20        LN:63025520
@SQ     SN:chr21        LN:48129895
@SQ     SN:chr22        LN:51304566
@SQ     SN:chrX LN:155270560
@SQ     SN:chrY LN:59373566
@SQ     SN:chr1_gl000191_random LN:106433
@SQ     SN:chr1_gl000192_random LN:547496
@SQ     SN:chr4_ctg9_hap1       LN:590426
@SQ     SN:chr4_gl000193_random LN:189789
@SQ     SN:chr4_gl000194_random LN:191469
@SQ     SN:chr6_apd_hap1        LN:4622290
@SQ     SN:chr6_cox_hap2        LN:4795371
@SQ     SN:chr6_dbb_hap3        LN:4610396
@SQ     SN:chr6_mann_hap4       LN:4683263
@SQ     SN:chr6_mcf_hap5        LN:4833398
@SQ     SN:chr6_qbl_hap6        LN:4611984
@SQ     SN:chr6_ssto_hap7       LN:4928567
@SQ     SN:chr7_gl000195_random LN:182896
@SQ     SN:chr8_gl000196_random LN:38914
@SQ     SN:chr8_gl000197_random LN:37175
@SQ     SN:chr9_gl000198_random LN:90085
@SQ     SN:chr9_gl000199_random LN:169874
@SQ     SN:chr9_gl000200_random LN:187035
@SQ     SN:chr9_gl000201_random LN:36148
@SQ     SN:chr11_gl000202_random        LN:40103
@SQ     SN:chr17_ctg5_hap1      LN:1680828
@SQ     SN:chr17_gl000203_random        LN:37498
@SQ     SN:chr17_gl000204_random        LN:81310
@SQ     SN:chr17_gl000205_random        LN:174588
@SQ     SN:chr17_gl000206_random        LN:41001
@SQ     SN:chr18_gl000207_random        LN:4262
@SQ     SN:chr19_gl000208_random        LN:92689
@SQ     SN:chr19_gl000209_random        LN:159169
@SQ     SN:chr21_gl000210_random        LN:27682
@SQ     SN:chrUn_gl000211       LN:166566
@SQ     SN:chrUn_gl000212       LN:186858
@SQ     SN:chrUn_gl000213       LN:164239
@SQ     SN:chrUn_gl000214       LN:137718
@SQ     SN:chrUn_gl000215       LN:172545
@SQ     SN:chrUn_gl000216       LN:172294
@SQ     SN:chrUn_gl000217       LN:172149
@SQ     SN:chrUn_gl000218       LN:161147
@SQ     SN:chrUn_gl000219       LN:179198
@SQ     SN:chrUn_gl000220       LN:161802
@SQ     SN:chrUn_gl000221       LN:155397
@SQ     SN:chrUn_gl000222       LN:186861
@SQ     SN:chrUn_gl000223       LN:180455
@SQ     SN:chrUn_gl000224       LN:179693
@SQ     SN:chrUn_gl000225       LN:211173
@SQ     SN:chrUn_gl000226       LN:15008
@SQ     SN:chrUn_gl000227       LN:128374
@SQ     SN:chrUn_gl000228       LN:129120
@SQ     SN:chrUn_gl000229       LN:19913
@SQ     SN:chrUn_gl000230       LN:43691
@SQ     SN:chrUn_gl000231       LN:27386
@SQ     SN:chrUn_gl000232       LN:40652
@SQ     SN:chrUn_gl000233       LN:45941
@SQ     SN:chrUn_gl000234       LN:40531
@SQ     SN:chrUn_gl000235       LN:34474
@SQ     SN:chrUn_gl000236       LN:41934
@SQ     SN:chrUn_gl000237       LN:45867
@SQ     SN:chrUn_gl000238       LN:39939
@SQ     SN:chrUn_gl000239       LN:33824
@SQ     SN:chrUn_gl000240       LN:41933
@SQ     SN:chrUn_gl000241       LN:42152
@SQ     SN:chrUn_gl000242       LN:43523
@SQ     SN:chrUn_gl000243       LN:43341
@SQ     SN:chrUn_gl000244       LN:39929
@SQ     SN:chrUn_gl000245       LN:36651
@SQ     SN:chrUn_gl000246       LN:38154
@SQ     SN:chrUn_gl000247       LN:36422
@SQ     SN:chrUn_gl000248       LN:39786
@SQ     SN:chrUn_gl000249       LN:38502
@RG     ID:1    PL:illumina     PU:1_2014-03-11_OvationSureSelect_test  SM:OvationFFPE
@PG     ID:bwa  PN:bwa  VN:0.7.7-r441   CL:/group/ngs/src/bcbio-nextgen/0.7.8a/rhel5-x64/bin/bwa samse -r @RG\tID:1\tPL:illumina\tPU:1_2014-03-11_OvationSureSelect_test\tSM:OvationFFPE /ngs/reference_data/genomes/Hsapiens/hg19/bwa/hg19.fa /gpfs/ngs/oncology/Analysis/external/EXT_008_OvationSureSelectCompare/OvationSureSelect_test/work_singles/align/OvationFFPE/1_2014-03-11_OvationSureSelect_test_1.sai /ngs/oncology/datasets/external/EXT_008_OvationSureSelectCompare/OvationFFPE.fq
chapmanb commented 10 years ago

Miika; Thanks for sending the test files, that was a big help. It looks like the single/short read was a red herring. The actual issue is that there appears to be a bug in samtools when calling samtools view to convert from SAM to BAM if the input SAM has only a header and no reads.

I avoided the issue by using sambamba view, which handles this case correctly. Thanks again for the help debugging it and getting this sorted out.

mjafin commented 10 years ago

Ah.. Correlation != causality

Thanks for the fix!