bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

sambamba-index: Error reading BGZF block starting from offset 22120497656: wrong BGZF magic #1392

Closed mortunco closed 8 years ago

mortunco commented 8 years ago

Hi Brad,

My aim is to get variants called for my two bam files( I think you have heard this a lot). But I am facing the with the following error. As I check the previous entries about this error, it was told be related with ram usage and alignment. But as you may see, I dont have any alignment in my configuration I will call my mutation straight forward.

My configuration is shared below.

[tmorova15@yunus config]$ cat deneme10.yaml.txt 
#Cancer tumor/normal calling evaluation using synthetic dataset 3
# from the ICGC-TCGA DREAM challenge:
# https://www.synapse.org/#!Synapse:syn312572/wiki/62018

---
details:
- algorithm:
    aligner: false
    recalibrate: false
    realign: false
    remove_lcr: true
    platform: illumina
    mark_duplicates: false
    quality_format: illumina
    variantcaller: [mutect, vardict, varscan, freebayes]
    indelcaller: scalpel
    ensemble: 
      numpass: 2    
  analysis: variant2
  description: 140e5014-bdd6-4663-9404-234c7f9e927d 

  files:
    - ../input/normal.bam   
  genome_build: GRCh37
  metadata:
    batch: ICGC
    phenotype: normal
- algorithm:
    aligner: false
    recalibrate: false
    realign: false
    remove_lcr: true
    platform: illumina
    mark_duplicates: false
    quality_format: illumina
    variantcaller: [mutect, vardict, varscan, freebayes] 
    indelcaller: scalpel
    ensemble:
      numpass: 2
  analysis: variant2
  description: a9ec7d9e-b179-4782-a589-43c7d1642be9 

  files:
    - ../input/tumor.bam
  genome_build: GRCh37
  metadata:
    batch: ICGC
    phenotype: tumor
fc_date: '2015-04-25'
fc_name: ICGC-trials
#resources:
#  gatk:
#    jar: /home/ec2-user/GATK/GenomeAnalysisTK.jar 
#  mutect: 
#    jar: /home/ec2-user/GATK/mutect-1.1.7.jar
upload:
  dir: ../final/

The Error

[tmorova15@yunus log]$ cat bcbio-nextgen-debug.log 
[2016-05-14T20:32Z] System YAML configuration: /mnt/kufs/scratch/tmorova15/bcbio/galaxy/bcbio_system.yaml
[2016-05-14T20:32Z] Resource requests: sambamba, samtools; memory: 2.00, 2.00; cores: 16, 16
[2016-05-14T20:32Z] Configuring 1 jobs to run, using 8 cores each with 16.1g of memory reserved for each job
[2016-05-14T20:32Z] Timing: organize samples
[2016-05-14T20:32Z] multiprocessing: organize_samples
[2016-05-14T20:32Z] Using input YAML configuration: /mnt/kufs/scratch/tmorova15/ku_deneme/config/deneme10.yaml.txt
[2016-05-14T20:32Z] Checking sample YAML configuration: /mnt/kufs/scratch/tmorova15/ku_deneme/config/deneme10.yaml.txt
[2016-05-14T20:32Z] Testing minimum versions of installed programs
[2016-05-14T20:32Z] Timing: alignment preparation
[2016-05-14T20:32Z] multiprocessing: prep_align_inputs
[2016-05-14T20:32Z] multiprocessing: disambiguate_split
[2016-05-14T20:32Z] Timing: alignment
[2016-05-14T20:32Z] multiprocessing: process_alignment
[2016-05-14T20:32Z] Index BAM file: normal.bam
[2016-05-14T20:34Z] sambamba-index: Error reading BGZF block starting from offset 22120497656: wrong BGZF magic
[2016-05-14T20:34Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /mnt/kufs/scratch/tmorova15/bcbio/galaxy/../anaconda/bin/sambamba index -t 8 /mnt/kufs/scratch/tmorova15/ku_deneme/work/prealign/140e5014-bdd6-4663-9404-234c7f9e927d/tx/tmplOBNv0/normal.bam
sambamba-index: Error reading BGZF block starting from offset 22120497656: wrong BGZF magic
' returned non-zero exit status 1
mortunco commented 8 years ago

I read the issue that posted yesterday. I also did like him. I downloaded the data with wget as well. if it is related with the download I can use another data set ??

mortunco commented 8 years ago

stdout.txt Dear Bread;

I thought I got away with that error by trying to run the process on a new dataset. It seemed ok for a while but now I got an interesting error.This system worked without an error for chr 6 data, but I dont know what is wrong with the system now ? I share my stdout as a txt file because github did not let me post here due the limit of the characters. The stdout is more detailed.

I am ready to everything you tell may suggest to me.

Best regards,

Tunc.

This is the error that comes from the debug log

[tmorova15@yunus log]$ cat bcbio-nextgen-debug.log 
[2016-05-15T15:17Z] System YAML configuration: /mnt/kufs/scratch/tmorova15/bcbio/galaxy/bcbio_system.yaml
[2016-05-15T15:17Z] Resource requests: sambamba, samtools; memory: 2.00, 2.00; cores: 16, 16
[2016-05-15T15:17Z] Configuring 1 jobs to run, using 8 cores each with 16.1g of memory reserved for each job
[2016-05-15T15:17Z] Timing: organize samples
[2016-05-15T15:17Z] multiprocessing: organize_samples
[2016-05-15T15:17Z] Using input YAML configuration: /mnt/kufs/scratch/tmorova15/ku_deneme_2/config/deneme10.yaml.txt
[2016-05-15T15:17Z] Checking sample YAML configuration: /mnt/kufs/scratch/tmorova15/ku_deneme_2/config/deneme10.yaml.txt
[2016-05-15T15:17Z] Testing minimum versions of installed programs
[2016-05-15T15:17Z] Timing: alignment preparation
[2016-05-15T15:17Z] multiprocessing: prep_align_inputs
[2016-05-15T15:17Z] multiprocessing: disambiguate_split
[2016-05-15T15:17Z] Timing: alignment
[2016-05-15T15:17Z] multiprocessing: process_alignment
[2016-05-15T15:17Z] Index BAM file: PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T15:24Z] Timing: callable regions
[2016-05-15T15:24Z] multiprocessing: prep_samples
[2016-05-15T15:24Z] multiprocessing: postprocess_alignment
[2016-05-15T15:24Z] Resource requests: ; memory: 1.00; cores: 1
[2016-05-15T15:24Z] Configuring 8 jobs to run, using 1 cores each with 1.00g of memory reserved for each job
[2016-05-15T15:24Z] multiprocessing: calc_callable_loci
[2016-05-15T15:24Z] bedtools genomecov: 6 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T15:24Z] bedtools genomecov: 7 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T15:24Z] bedtools genomecov: 3 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T15:24Z] bedtools genomecov: 4 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T15:24Z] bedtools genomecov: 5 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T15:24Z] bedtools genomecov: 8 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T15:24Z] bedtools genomecov: 1 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T15:24Z] bedtools genomecov: 2 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:02Z] bedtools groupby coverage: 8 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:04Z] bedtools groupby coverage: 7 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:06Z] bedtools genomecov: 9 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:07Z] bedtools groupby coverage: 6 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:08Z] bedtools groupby coverage: 5 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:09Z] bedtools genomecov: 10 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:10Z] bedtools groupby coverage: 4 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:11Z] bedtools groupby coverage: 3 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:12Z] bedtools genomecov: 11 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:13Z] bedtools genomecov: 12 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:15Z] bedtools groupby coverage: 1 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:16Z] bedtools genomecov: 13 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:17Z] bedtools groupby coverage: 2 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:17Z] bedtools genomecov: 14 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:22Z] bedtools genomecov: 15 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:24Z] bedtools genomecov: 16 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:24Z] bedtools groupby coverage: 9 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:28Z] bedtools genomecov: 17 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:29Z] bedtools groupby coverage: 10 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:31Z] bedtools groupby coverage: 13 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:32Z] bedtools groupby coverage: 14 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:33Z] bedtools genomecov: 18 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:34Z] bedtools groupby coverage: 11 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:34Z] bedtools genomecov: 19 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:35Z] bedtools genomecov: 20 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:35Z] bedtools groupby coverage: 12 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:35Z] bedtools groupby coverage: 15 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:37Z] bedtools groupby coverage: 16 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:37Z] bedtools genomecov: 21 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:38Z] bedtools genomecov: 22 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:39Z] bedtools genomecov: X : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:39Z] bedtools genomecov: Y : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:40Z] bedtools groupby coverage: 17 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:41Z] bedtools groupby coverage: Y : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:41Z] bedtools genomecov: NC_007605 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:41Z] bedtools groupby coverage: NC_007605 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:41Z] bedtools genomecov: hs37d5 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:41Z] bedtools groupby coverage: hs37d5 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:43Z] bedtools groupby coverage: 19 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:43Z] bedtools groupby coverage: 22 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:43Z] bedtools groupby coverage: 21 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:44Z] bedtools groupby coverage: 20 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:45Z] bedtools groupby coverage: 18 : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:51Z] bedtools groupby coverage: X : PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam
[2016-05-15T16:54Z] multiprocessing: combine_bed
[2016-05-15T16:54Z] Resource requests: ; memory: 1.00; cores: 1
[2016-05-15T16:54Z] Configuring 8 jobs to run, using 1 cores each with 1.00g of memory reserved for each job
[2016-05-15T16:54Z] 112415c3-8cea-4608-9f55-c5714ac77390: Assigned coverage as 'genome' with 91.1% genome coverage and 0.0% offtarget coverage
[2016-05-15T17:00Z] Identify high coverage regions
[2016-05-15T17:00Z] Processing reference #1 (1)
[2016-05-15T17:05Z] Processing reference #2 (2)
[2016-05-15T17:11Z] Processing reference #3 (3)
[2016-05-15T17:15Z] Processing reference #4 (4)
[2016-05-15T17:19Z] Processing reference #5 (5)
[2016-05-15T17:23Z] Processing reference #6 (6)
[2016-05-15T17:27Z] Processing reference #7 (7)
[2016-05-15T17:30Z] Processing reference #8 (8)
[2016-05-15T17:34Z] Processing reference #9 (9)
[2016-05-15T17:36Z] Processing reference #10 (10)
[2016-05-15T17:39Z] Processing reference #11 (11)
[2016-05-15T17:42Z] Processing reference #12 (12)
[2016-05-15T17:45Z] Processing reference #13 (13)
[2016-05-15T17:47Z] Processing reference #14 (14)
[2016-05-15T17:49Z] Processing reference #15 (15)
[2016-05-15T17:51Z] Processing reference #16 (16)
[2016-05-15T17:53Z] Processing reference #17 (17)
[2016-05-15T17:55Z] Processing reference #18 (18)
[2016-05-15T17:57Z] Processing reference #19 (19)
[2016-05-15T17:58Z] Processing reference #20 (20)
[2016-05-15T17:59Z] Processing reference #21 (21)
[2016-05-15T18:00Z] Processing reference #22 (22)
[2016-05-15T18:01Z] Processing reference #23 (X)
[2016-05-15T18:03Z] Processing reference #24 (Y)
[2016-05-15T18:03Z] Processing reference #25 (MT)
[2016-05-15T18:03Z] Processing reference #26 (GL000207.1)
[2016-05-15T18:03Z] Processing reference #27 (GL000226.1)
[2016-05-15T18:03Z] Processing reference #28 (GL000229.1)
[2016-05-15T18:03Z] Processing reference #29 (GL000231.1)
[2016-05-15T18:03Z] Processing reference #30 (GL000210.1)
[2016-05-15T18:03Z] Processing reference #31 (GL000239.1)
[2016-05-15T18:03Z] Processing reference #32 (GL000235.1)
[2016-05-15T18:03Z] Processing reference #33 (GL000201.1)
[2016-05-15T18:03Z] Processing reference #34 (GL000247.1)
[2016-05-15T18:03Z] Processing reference #35 (GL000245.1)
[2016-05-15T18:03Z] Processing reference #36 (GL000197.1)
[2016-05-15T18:03Z] Processing reference #37 (GL000203.1)
[2016-05-15T18:03Z] Processing reference #38 (GL000246.1)
[2016-05-15T18:03Z] Processing reference #39 (GL000249.1)
[2016-05-15T18:03Z] Processing reference #40 (GL000196.1)
[2016-05-15T18:03Z] Processing reference #41 (GL000248.1)
[2016-05-15T18:03Z] Processing reference #42 (GL000244.1)
[2016-05-15T18:03Z] Processing reference #43 (GL000238.1)
[2016-05-15T18:03Z] Processing reference #44 (GL000202.1)
[2016-05-15T18:03Z] Processing reference #45 (GL000234.1)
[2016-05-15T18:03Z] Processing reference #46 (GL000232.1)
[2016-05-15T18:03Z] Processing reference #47 (GL000206.1)
[2016-05-15T18:03Z] Processing reference #48 (GL000240.1)
[2016-05-15T18:03Z] Processing reference #49 (GL000236.1)
[2016-05-15T18:03Z] Processing reference #50 (GL000241.1)
[2016-05-15T18:03Z] Processing reference #51 (GL000243.1)
[2016-05-15T18:03Z] Processing reference #52 (GL000242.1)
[2016-05-15T18:03Z] Processing reference #53 (GL000230.1)
[2016-05-15T18:03Z] Processing reference #54 (GL000237.1)
[2016-05-15T18:03Z] Processing reference #55 (GL000233.1)
[2016-05-15T18:03Z] Processing reference #56 (GL000204.1)
[2016-05-15T18:03Z] Processing reference #57 (GL000198.1)
[2016-05-15T18:03Z] Processing reference #58 (GL000208.1)
[2016-05-15T18:03Z] Processing reference #59 (GL000191.1)
[2016-05-15T18:03Z] Processing reference #60 (GL000227.1)
[2016-05-15T18:03Z] Processing reference #61 (GL000228.1)
[2016-05-15T18:03Z] Processing reference #62 (GL000214.1)
[2016-05-15T18:03Z] Processing reference #63 (GL000221.1)
[2016-05-15T18:03Z] Processing reference #64 (GL000209.1)
[2016-05-15T18:03Z] Processing reference #65 (GL000218.1)
[2016-05-15T18:03Z] Processing reference #66 (GL000220.1)
[2016-05-15T18:03Z] Processing reference #67 (GL000213.1)
[2016-05-15T18:03Z] Processing reference #68 (GL000211.1)
[2016-05-15T18:03Z] Processing reference #69 (GL000199.1)
[2016-05-15T18:03Z] Processing reference #70 (GL000217.1)
[2016-05-15T18:03Z] Processing reference #71 (GL000216.1)
[2016-05-15T18:03Z] Processing reference #72 (GL000215.1)
[2016-05-15T18:03Z] Processing reference #73 (GL000205.1)
[2016-05-15T18:03Z] Processing reference #74 (GL000219.1)
[2016-05-15T18:03Z] Processing reference #75 (GL000224.1)
[2016-05-15T18:03Z] Processing reference #76 (GL000223.1)
[2016-05-15T18:03Z] Processing reference #77 (GL000195.1)
[2016-05-15T18:03Z] Processing reference #78 (GL000212.1)
[2016-05-15T18:03Z] Processing reference #79 (GL000222.1)
[2016-05-15T18:03Z] Processing reference #80 (GL000200.1)
[2016-05-15T18:03Z] Processing reference #81 (GL000193.1)
[2016-05-15T18:03Z] Processing reference #82 (GL000194.1)
[2016-05-15T18:03Z] Processing reference #83 (GL000225.1)
[2016-05-15T18:03Z] Processing reference #84 (GL000192.1)
[2016-05-15T18:03Z] Processing reference #85 (NC_007605)
[2016-05-15T18:03Z] Processing reference #86 (hs37d5)
[2016-05-15T18:05Z] Clean up raw coverage file
[2016-05-15T18:05Z] Prepare cleaned BED file : 112415c3-8cea-4608-9f55-c5714ac77390
[2016-05-15T18:06Z] bgzip PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e-callable-callableblocks.bed
[2016-05-15T18:06Z] tabix index PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e-callable-callableblocks.bed.gz
[2016-05-15T18:06Z] Prepare merged BED file : 112415c3-8cea-4608-9f55-c5714ac77390
[2016-05-15T18:06Z] bgzip PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e-callable-callableblocks-merged.bed
[2016-05-15T18:06Z] tabix index PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e-callable-callableblocks-merged.bed.gz
[2016-05-15T18:06Z] Resource requests: ; memory: 1.00; cores: 1
[2016-05-15T18:06Z] Configuring 8 jobs to run, using 1 cores each with 1.00g of memory reserved for each job
[2016-05-15T18:06Z] multiprocessing: calc_callable_loci
[2016-05-15T18:06Z] bedtools genomecov: 3 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T18:06Z] bedtools genomecov: 1 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T18:06Z] bedtools genomecov: 4 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T18:06Z] bedtools genomecov: 2 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T18:06Z] bedtools genomecov: 7 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T18:06Z] bedtools genomecov: 5 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T18:06Z] bedtools genomecov: 6 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T18:06Z] bedtools genomecov: 8 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T18:55Z] bedtools groupby coverage: 8 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T18:59Z] bedtools groupby coverage: 7 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:00Z] bedtools genomecov: 9 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:01Z] bedtools groupby coverage: 6 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:03Z] bedtools groupby coverage: 5 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:04Z] bedtools genomecov: 10 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:05Z] bedtools groupby coverage: 4 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:06Z] bedtools groupby coverage: 3 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:07Z] bedtools genomecov: 11 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:10Z] bedtools genomecov: 12 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:12Z] bedtools genomecov: 13 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:13Z] bedtools groupby coverage: 1 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:14Z] bedtools genomecov: 14 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:15Z] bedtools groupby coverage: 2 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:21Z] bedtools genomecov: 15 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:23Z] bedtools groupby coverage: 9 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:23Z] bedtools genomecov: 16 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:27Z] bedtools genomecov: 17 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:31Z] bedtools groupby coverage: 10 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:32Z] bedtools groupby coverage: 13 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:33Z] bedtools groupby coverage: 14 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:35Z] bedtools groupby coverage: 11 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:35Z] bedtools genomecov: 18 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:36Z] bedtools genomecov: 19 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:36Z] bedtools genomecov: 20 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:37Z] bedtools groupby coverage: 12 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:38Z] bedtools groupby coverage: 15 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:39Z] bedtools groupby coverage: 16 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:40Z] bedtools genomecov: 21 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:41Z] bedtools genomecov: 22 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:42Z] bedtools genomecov: X : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:42Z] bedtools genomecov: Y : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:43Z] bedtools groupby coverage: 17 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:43Z] sambamba-view: Error reading BGZF block starting from offset 145856923686: wrong BGZF magic
[2016-05-15T19:45Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /mnt/kufs/scratch/tmorova15/bcbio/galaxy/../anaconda/bin/sambamba view -F 'mapping_quality > 0' -L /mnt/kufs/scratch/tmorova15/ku_deneme_2/work/align/0c9a5dbc-f7f3-43e7-83d1-77b9fb4b8b54/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-callable-split/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-Y-callable-coverageregions.bed -f bam -l 1 /mnt/kufs/scratch/tmorova15/ku_deneme_2/work/align/0c9a5dbc-f7f3-43e7-83d1-77b9fb4b8b54/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam | /mnt/kufs/scratch/tmorova15/bcbio/galaxy/../anaconda/bin/bedtools genomecov -split -ibam stdin -bga -g /mnt/kufs/scratch/tmorova15/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa.fai > /mnt/kufs/scratch/tmorova15/ku_deneme_2/work/align/0c9a5dbc-f7f3-43e7-83d1-77b9fb4b8b54/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-callable-split/tx/tmpY5uHKo/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-Y-callable-genomecov.bed
sambamba-view: Error reading BGZF block starting from offset 145856923686: wrong BGZF magic
' returned non-zero exit status 1
[2016-05-15T19:45Z] bedtools genomecov: NC_007605 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:45Z] bedtools groupby coverage: NC_007605 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:45Z] bedtools genomecov: hs37d5 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:45Z] bedtools groupby coverage: hs37d5 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:47Z] bedtools groupby coverage: 21 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:47Z] bedtools groupby coverage: 19 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:48Z] bedtools groupby coverage: 22 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:48Z] bedtools groupby coverage: 20 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:50Z] bedtools groupby coverage: 18 : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
[2016-05-15T19:58Z] bedtools groupby coverage: X : PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
chapmanb commented 8 years ago

Tunc; Sorry about the issues. It looks like you're having consistent problems with reading and writing files on this filesystem, similar to our discussion about intermittent failures in #1383. The key error you see in both failures is:

Error reading BGZF block starting from offset NNN: wrong BGZF magic

which indicates sambamba is having issues reading the BAM file. This normally indicates truncated files but given that these are working intermittenly from the same inputs the likely error is that the tools are having trouble cleanly reading the input BAM due to something problematic about the filesystem interactions.

I wish I had better suggestions to give you but my thoughts would either be to debug the problematic filesystem issues or retry and hopefully make it through failure issues when the filesystem is more responsive. Hope this helps some.

mortunco commented 8 years ago

Brad;

Thank you for your fast response. I would like to ask a/an simple/ignorant question. This system worked for the chr6 data. Which also uses the same file system, same bcbio, shouldn't we have the same problem in bcbio also?

Is this the problem of sun grid engine queuing system problem? Could you specify at which point our file system causes problem. For example; bcbio is not macox compatible? Can Using bcbio-vm solve the problem? Our file system uses lustre parallel file system on 6 OSS.

I talked to IT depth, they said they have not updated lustre for 2 years( but I am the only one having error related to file system though, so it works very well.) Is there a version of software that we should provide in order to get bcbio working back again? (but the problem is bcbio actually works for chr6 data).

In addition, IT told me that, in order to deal with big files, file system divides everything to 6 discs, they told me that they will try 3 things.

  1. Update lustre
  2. Allocate my process in to a single disc.

I apologise that I am repeating myself, but I am very confused with the inconsistency of the system. I thought this time I really had it. Please understand my frustration. Also, in my last post, I also apologise for I misspelled your name.

Thank you all for your help,

Best regards,

Tunc.

chapmanb commented 8 years ago

Tunc; Sorry about the frustration. To be clear, I'm only guessing here as I'm not sure what is going on. It's clear the sambamba thinks your input files are corrupted and I can think of two reasons for this:

  1. There is really something wrong with the input file. It's not entirely clear if you're using pre-aligned files or letting bcbio align them, but there could be something wrong with the inputs if pre-aligned.
  2. The filesystem is intermittently having issues under load and returning files in a way that makes sambamba think there is an issue.

I'm suggesting the second because you seem to have a similar issue across many different datasets, but only intermittently on some samples. It's a bit hard for me to debug because you keep swapping between different samples and projects so I'm not sure what is reproducible and what is not. To speak to your frustration, this has nothing to do with bcbio installation or setup, and has something to do with the specific inputs of a run.

My suggestion to debug would be to pick one run that fails every time at the same point if you re-run bcbio. Then isolate the failing command line and see if you can reproduce the failure outside of bcbio. If you can't reproduce and keep failing in different places, this suggests a problem with how Lustre is working with these files. If you can reproduce, then this suggests something is wrong with the input file.

Hope this helps.

mortunco commented 8 years ago

Dear Brad; I tried 4 different runs;

  1. Try to reproduce the error with the command that creates problem.
  2. Different file system( the error that I mentioned above) --- > I decided that this error is not related with the filesystem error anymore.
  3. Reproduce cancer timor-normal paired tutorial data. (running without any problem) 4 I added alignment option to be sure to clean out that alignment issue. (I will share the error with you on the next post)

Based on the things that I try;

  1. I have tried and got the same error for the 4th run in the different filesystem. I think this problem is not related with the system nor the filesystem.

2.. Ran the command that causes problem.

/mnt/kufs/scratch/tmorova15/bcbio/galaxy/../anaconda/bin/sambamba view -F 'mapping_quality > 0' -L /mnt/kufs/scratch/tmorova15/ku_deneme_2/work/align/0c9a5dbc-f7f3-43e7-83d1-77b9fb4b8b54/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-callable-split/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-Y-callable-coverageregions.bed -f bam -l 1 /mnt/kufs/scratch/tmorova15/ku_deneme_2/work/align/0c9a5dbc-f7f3-43e7-83d1-77b9fb4b8b54/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam | /mnt/kufs/scratch/tmorova15/bcbio/galaxy/../anaconda/bin/bedtools genomecov -split -ibam stdin -bga -g /mnt/kufs/scratch/tmorova15/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa.fai

without writing to an output with > /mnt/kufs/scratch/tmorova15/ku_deneme_2/work/align/0c9a5dbc-f7f3-43e7-83d1-77b9fb4b8b54/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-callable-split/tx/tmpxq2RSl/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-Y-callable-genomecov.bed

It did not give any error. But when I try to interact with it for example write to a file or grep something it it, I got the error

sambamba-view: Error reading BGZF block starting from offset 145856923686: wrong BGZF magic

Could this be a clue to anything?

  1. Running without any error. (knocking the wood)

Also, I checked the file of the files. Do you think this error is related with the compression issue of the gzip. Because, if bcbio recognzies the files by the extension, it might miss the gzip and creates a problem?

[tmorova15@yunus input]$ file *
24dbbcb3-d97c-4ffe-bc55-8bd7a78eb5b8.gto:           BitTorrent file
PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam:     gzip compressed data, extra field
PCAWG.2ef785e0-283d-430e-99d7-536a8a67c39e.bam.bai: data
PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam:     gzip compressed data, extra field
PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam.bai: data
mortunco commented 8 years ago

This is the output of the bcbio.log of the output of the run with the alignment on.

[tmorova15@yunus log]$ cat bcbio-nextgen.log 
[2016-05-16T11:56Z] System YAML configuration: /mnt/kufs/scratch/tmorova15/bcbio/galaxy/bcbio_system.yaml
[2016-05-16T11:56Z] Timing: organize samples
[2016-05-16T11:56Z] multiprocessing: organize_samples
[2016-05-16T11:56Z] Using input YAML configuration: /mnt/kufs/scratch/tmorova15/ku_deneme/config/deneme11.yaml
[2016-05-16T11:56Z] Checking sample YAML configuration: /mnt/kufs/scratch/tmorova15/ku_deneme/config/deneme11.yaml
[2016-05-16T11:56Z] Testing minimum versions of installed programs
[2016-05-16T11:56Z] Timing: alignment preparation
[2016-05-16T11:56Z] multiprocessing: prep_align_inputs
[2016-05-16T15:21Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/bcbio/ngsalign/alignprep.py", line 429, in _bgzip_from_bam
    log_error=False)
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/mnt/kufs/scratch/tmorova15/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; /mnt/kufs/scratch/tmorova15/bcbio/galaxy/../anaconda/bin/bamtofastq filename=/mnt/kufs/scratch/tmorova15/ku_deneme/input/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam T=/mnt/kufs/scratch/tmorova15/ku_deneme/work/align_prep/tx/tmpEb7Pmb/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-1.fq-sort F=>(/mnt/kufs/scratch/tmorova15/bcbio/galaxy/../anaconda/bin/pbgzip -n 8  -c /dev/stdin > /mnt/kufs/scratch/tmorova15/ku_deneme/work/align_prep/tx/tmpEb7Pmb/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-1.fq.gz) F2=>(/mnt/kufs/scratch/tmorova15/bcbio/galaxy/../anaconda/bin/pbgzip -n 8  -c /dev/stdin > /mnt/kufs/scratch/tmorova15/ku_deneme/work/align_prep/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc-2.fq.gz) S=/dev/null O=/dev/null O2=/dev/null collate=1 colsbs=16777216
[V] 1601    69.5284MB/s 299430
[V] 1602    69.5259MB/s 299419
[V] 1603    69.5251MB/s 299415
[V] 1604    69.5229MB/s 299406
[V] 1605    69.5228MB/s 299405
[V] 1606    69.5253MB/s 299416
[V] 1607    69.5241MB/s 299411
[V] 1608    69.5228MB/s 299405
[V] 1609    69.525MB/s  299415
[V] 1610    69.5251MB/s 299415
[V] 1611    69.5239MB/s 299410
[V] 1612    69.523MB/s  299406
[V] 1613    69.5219MB/s 299401
[V] 1614    69.5219MB/s 299402
[V] 1615    69.5229MB/s 299406
[V] 1616    69.5217MB/s 299401
[V] 1617    69.5222MB/s 299403
[V] 1618    69.52MB/s   299393
[V] 1619    69.5195MB/s 299391
[V] 1620    69.5143MB/s 299369
[V] 1621    69.5177MB/s 299383
[V] 1622    69.5185MB/s 299387
[V] 1623    69.5207MB/s 299396
[V] 1624    69.5192MB/s 299390
[V] 1625    69.517MB/s  299380
[V] 1626    69.5172MB/s 299381
[V] 1627    69.5191MB/s 299390
[V] 1628    69.5174MB/s 299382
[V] 1629    69.5158MB/s 299375
[V] 1630    69.5105MB/s 299352
[V] 1631    69.5097MB/s 299349
[V] 1632    69.5083MB/s 299343
[V] 1633    69.4989MB/s 299303
[V] 1634    69.4975MB/s 299296
[V] 1635    69.4961MB/s 299291
[V] 1636    69.4953MB/s 299287
[V] 1637    69.4919MB/s 299272
[V] 1638    69.4848MB/s 299242
[V] 1639    69.4849MB/s 299242
[V] 1640    69.4807MB/s 299224
[V] 1641    69.4779MB/s 299212
[V] 1642    69.4767MB/s 299207
[V] 1643    69.4691MB/s 299174
[V] 1644    69.4608MB/s 299138
[V] 1645    69.4604MB/s 299137
[V] 1646    69.4483MB/s 299085
[V] 1647    69.4469MB/s 299079
[V] 1648    69.4401MB/s 299049
[V] 1649    69.4419MB/s 299057
[V] 1650    69.4417MB/s 299056
[V] 1651    69.4343MB/s 299024
[V] 1652    69.4324MB/s 299016
[V] 1653    69.4301MB/s 299006
[V] 1654    69.4271MB/s 298993
[V] 1655    69.42MB/s   298963
[V] 1656    69.4166MB/s 298948
[V] 1657    69.4195MB/s 298961
[V] 1658    69.4178MB/s 298953
[V] 1659    69.4167MB/s 298949
[V] 1660    69.4168MB/s 298949
[V] 1661    69.4169MB/s 298950
[V] 1662    69.4148MB/s 298940
[V] 1663    69.4137MB/s 298935
[V] 1664    69.413MB/s  298933
[V] 1665    69.4123MB/s 298929
[V] 1666    69.4119MB/s 298928
[V] 1667    69.4067MB/s 298905
[V] 1668    69.4108MB/s 298923
[V] 1669    69.4102MB/s 298921
[V] 1670    69.4089MB/s 298915
[V] 1671    69.4023MB/s 298887
[V] 1672    69.3986MB/s 298871
[V] 1673    69.396MB/s  298859
[V] 1674    69.3997MB/s 298875
[V] 1675    69.3887MB/s 298828
[V] 1676    69.387MB/s  298821
[V] 1677    69.3857MB/s 298815
[V] 1678    69.3847MB/s 298811
[V] 1679    69.379MB/s  298786
[V] 1680    69.3634MB/s 298719
[V] 1681    69.3536MB/s 298677
[V] 1682    69.3453MB/s 298641
[V] 1683    69.3342MB/s 298593
BgzfInflate::decompressBlock(): inflate failed
/mnt/kufs/scratch/tmorova15/bcbio/anaconda/share/biobambam-2.0.42-0/bin/../lib/libmaus2.so.2(libmaus2::util::StackTrace::StackTrace()+0x4c)[0x2ab10d1c5f2c]
/mnt/kufs/scratch/tmorova15/bcbio/galaxy/../anaconda/bin/bamtofastq(libmaus2::exception::LibMausException::LibMausException()+0x20)[0x419bc0]
()[0x41cd83]
()[0x43a920]
/mnt/kufs/scratch/tmorova15/bcbio/anaconda/share/biobambam-2.0.42-0/bin/../lib/libstdc++.so.6(std::basic_streambuf<char, std::char_traits<char> >::xsgetn(char*, long)+0xbd)[0x2ab10e0bc26d]
/mnt/kufs/scratch/tmorova15/bcbio/anaconda/share/biobambam-2.0.42-0/bin/../lib/libstdc++.so.6(std::istream::read(char*, long)+0x5b)[0x2ab10e09787b]
()[0x483681]
()[0x4838f4]
()[0x45eb5b]
()[0x4630ad]
()[0x41671d]
()[0x417312]
()[0x41229d]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3350e1d994]
()[0x413412]

' returned non-zero exit status 1
chapmanb commented 8 years ago

Tunc; Thanks for the detailed debugging on this. That's a big help to isolate the problem. Given that multiple tools are complaining about this file on multiple filesystems, my guess is that this BAM file is truncated:

PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam

It could either be broken at the original source or have been a retrieval problem. You can try re-downloading it from the original source as a first pass to see if the file sizes match, or can use a tool like Picard's ValidateSamFile to confirm the BAM file looks okay:

http://gatkforums.broadinstitute.org/gatk/discussion/7571/errors-in-sam-bam-files-can-be-diagnosed-with-validatesamfile

Hope this helps explain it.

mortunco commented 8 years ago

Dear Brad;

This is the output of the ValidateSam belongs to tumor sample. The normal sample gave the output of "no errors were found". Is this the output that we expect or were there something wrong in the process ??

$picard ValidateSamFile I=PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam MODE=SUMMARY
...
...
...
INFO    2016-05-17 17:11:02 SamFileValidator    Validated Read 1,810,000,000 records.  Elapsed time: 03:32:32s.  Time for last 10,000,000:   64s.  Last read position: 22:36,700,412
INFO    2016-05-17 17:12:05 SamFileValidator    Validated Read 1,820,000,000 records.  Elapsed time: 03:33:35s.  Time for last 10,000,000:   63s.  Last read position: 22:51,066,205
INFO    2016-05-17 17:14:54 SamFileValidator    Validated Read 1,830,000,000 records.  Elapsed time: 03:36:24s.  Time for last 10,000,000:  168s.  Last read position: X:26,289,283
INFO    2016-05-17 17:15:57 SamFileValidator    Validated Read 1,840,000,000 records.  Elapsed time: 03:37:27s.  Time for last 10,000,000:   63s.  Last read position: X:54,955,179
INFO    2016-05-17 17:17:01 SamFileValidator    Validated Read 1,850,000,000 records.  Elapsed time: 03:38:31s.  Time for last 10,000,000:   64s.  Last read position: X:82,497,174
INFO    2016-05-17 17:18:04 SamFileValidator    Validated Read 1,860,000,000 records.  Elapsed time: 03:39:33s.  Time for last 10,000,000:   62s.  Last read position: X:111,080,386
INFO    2016-05-17 17:19:06 SamFileValidator    Validated Read 1,870,000,000 records.  Elapsed time: 03:40:35s.  Time for last 10,000,000:   61s.  Last read position: X:139,697,080
INFO    2016-05-17 17:20:27 SamFileValidator    Validated Read 1,880,000,000 records.  Elapsed time: 03:41:57s.  Time for last 10,000,000:   81s.  Last read position: Y:16,476,581
[Tue May 17 17:21:28 EEST 2016] picard.sam.ValidateSamFile done. Elapsed time: 223.00 minutes.
Runtime.totalMemory()=946339840
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: Read error; BinaryCodec in readmode; file: /mnt/kufs/scratch/tmorova15/ku_deneme_2/input/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
    at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:406)
    at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380)
    at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:366)
    at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:199)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:661)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:635)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:629)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:599)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:544)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:518)
    at htsjdk.samtools.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:263)
    at htsjdk.samtools.SamFileValidator.validateSamFile(SamFileValidator.java:199)
    at htsjdk.samtools.SamFileValidator.validateSamFileSummary(SamFileValidator.java:127)
    at picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:163)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Caused by: java.io.IOException: Unexpected compressed block length: 1
    at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:377)
    at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:127)
    at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:252)
    at java.io.DataInputStream.read(Unknown Source)
    at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:404)
    ... 16 more

EDIT: This is the output of the second run of the validatesam. So this error is also consistent too.

INFO    2016-05-17 20:27:40 SamFileValidator    Validated Read 1,850,000,000 records.  Elapsed time: 02:51:46s.  Time for last 10,000,000:   53s.  Last read position: X:82,497,174
INFO    2016-05-17 20:28:32 SamFileValidator    Validated Read 1,860,000,000 records.  Elapsed time: 02:52:39s.  Time for last 10,000,000:   52s.  Last read position: X:111,080,386
INFO    2016-05-17 20:29:25 SamFileValidator    Validated Read 1,870,000,000 records.  Elapsed time: 02:53:32s.  Time for last 10,000,000:   52s.  Last read position: X:139,697,080
INFO    2016-05-17 20:30:26 SamFileValidator    Validated Read 1,880,000,000 records.  Elapsed time: 02:54:32s.  Time for last 10,000,000:   60s.  Last read position: Y:16,476,581
[Tue May 17 20:31:16 EEST 2016] picard.sam.ValidateSamFile done. Elapsed time: 175.41 minutes.
Runtime.totalMemory()=878706688
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: Read error; BinaryCodec in readmode; file: /mnt/kufs/scratch/tmorova15/ku_deneme_2/input/PCAWG.cb950391-e450-477c-9bb9-7c6699f764cc.bam
    at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:406)
    at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:380)
    at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:366)
    at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:199)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:661)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:635)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:629)
    at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:599)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:544)
    at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:518)
    at htsjdk.samtools.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:263)
    at htsjdk.samtools.SamFileValidator.validateSamFile(SamFileValidator.java:199)
    at htsjdk.samtools.SamFileValidator.validateSamFileSummary(SamFileValidator.java:127)
    at picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:163)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:209)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:95)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:105)
Caused by: java.io.IOException: Unexpected compressed block length: 1
    at htsjdk.samtools.util.BlockCompressedInputStream.readBlock(BlockCompressedInputStream.java:377)
    at htsjdk.samtools.util.BlockCompressedInputStream.available(BlockCompressedInputStream.java:127)
    at htsjdk.samtools.util.BlockCompressedInputStream.read(BlockCompressedInputStream.java:252)
    at java.io.DataInputStream.read(Unknown Source)
    at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:404)
    ... 16 more
chapmanb commented 8 years ago

Tunc; Thanks for checking the BAM file. The error message at the end of the Picard ValidateSam run is saying that there is a problem with the BAM file, and is similar to what you were seeing from sambamba. It appears that your BAM file is truncated so you'll need to re-download or check the initial source to try and identify the problem. Hopefully once you get a non-truncated file hopefully things will run cleanly through bcbio.