brentp / hts-nim

nim wrapper for htslib for parsing genomics data files
https://brentp.github.io/hts-nim/
MIT License
155 stars 26 forks source link

Error: unhandled exception: invalid bgzf file [ValueError] #73

Open Stikus opened 3 years ago

Stikus commented 3 years ago

Hello. When I ran mosdepth on our bam file I've got error which origin is from hts-nim - that's why I've created issue here. Here is my error:

root@b79bb970b062:/outputs# mosdepth -t 20 . /inputs/330003740807_S10.bam
bam.nim(390)             open
Error: unhandled exception: invalid bgzf file [ValueError]

And here is output of samtools view:

root@b79bb970b062:/outputs# samtools view -H /inputs/330003740807_S10.bam | head
[W::bam_hdr_read] EOF marker is absent. The input is probably truncated
@HD     VN:1.4  SO:coordinate
@SQ     SN:chr1 LN:248956422
@SQ     SN:chr2 LN:242193529
@SQ     SN:chr3 LN:198295559
@SQ     SN:chr4 LN:190214555
@SQ     SN:chr5 LN:181538259
@SQ     SN:chr6 LN:170805979
@SQ     SN:chr7 LN:159345973
@SQ     SN:chr8 LN:145138636
@SQ     SN:chr9 LN:138394717

And picard's ValidateSamFiles

root@0ea62a6ac57e:/outputs# java -jar $PICARD ValidateSamFile -I /inputs/330003740807_S10.bam -R /ref/GRCh38.d1.vd1/GRCh38.d1.vd1.fa -QUIET true 
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/tmp
10:36:39.816 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/soft/picard-2.25.0.jar!/com/intel/gkl/native/libgkl_compression.so
ERROR::TRUNCATED_FILE:Read name /inputs/330003740807_S10.bam, BAM file has defective last gzip block
...

I'll try to recreate this bam, but our bioinformatics pipeline worked fine with this bam (various variant callers and HLA-typers) and samtools throws warning - not error. Maybe hts-nim should throw warning too? As you can see - the file is totally readable.

DarioS commented 3 days ago

I am getting the same error on an old H.P.C.

$ module avail mosdepth
mosdepth/0.2.9(default)
$ lsb_release -a
Distributor ID: CentOS
Description:    CentOS release 6.10 (Final)
Release:        6.10
Codename:       Final
$ module load mosdepth
$ mosdepth -m testRun /rds/PRJ-HeadNeck/WholeGenome/alignments/OSCC_58-N.final.bam
bam.nim(354)             open
Error: unhandled exception: invalid bgzf file [ValueError]

Even with version 0.3.9 it is basically the same error.

$ ./mosdepth -m testRun /rds/PRJ-HeadNeck/WholeGenome/alignments/OSCC_58-N.final.bam
bam.nim(390)             open
Error: unhandled exception: invalid bgzf file [ValueError]
brentp commented 3 days ago

this likely means your bam file is truncated.

DarioS commented 3 days ago

Yiykes. samtools quickcheck -v shows that it is truncated.