BioJulia / Libz.jl

Fast, flexible zlib bindings.
Other
27 stars 17 forks source link

Inexact error when inflating #52

Open pwl opened 7 years ago

pwl commented 7 years ago

I'm getting the following inexact error on master when reading a big archive (20GB compressed size with ~4 compression ratio). The archive seems fine when I extract it with gunzip.

 [11] — fillbuffer!(::BufferedStreams.BufferedInputStream{BufferedStreams.BufferedInputStream{Libz.Source{:inflate,BufferedStreams.BufferedInputStream{IOStream}}}}) at bufferedinputstream.jl:70

 [12] — readbytes!(::BufferedStreams.BufferedInputStream{Libz.Source{:inflate,BufferedStreams.BufferedInputStream{IOStream}}}, ::Array{UInt8,1}, ::Int64, ::Int64) at bufferedinputstream.jl:233

 [13] — eof at bufferedinputstream.jl:106 [inlined]

 [14] — fillbuffer!(::BufferedStreams.BufferedInputStream{Libz.Source{:inflate,BufferedStreams.BufferedInputStream{IOStream}}}) at bufferedinputstream.jl:70

 [15] — readbytes!(::Libz.Source{:inflate,BufferedStreams.BufferedInputStream{IOStream}}, ::Array{UInt8,1}, ::Int64, ::Int64) at source.jl:149

 [16] — process(::Libz.Source{:inflate,BufferedStreams.BufferedInputStream{IOStream}}, ::Int32) at source.jl:165

InexactError()

This looks like an overflow in https://github.com/BioJulia/Libz.jl/blob/15f0644a91d20ef4469b00f529e6816fb5a6e7bd/src/lowlevel.jl#L64, would it make sense to change it to Culong? I did it locally and the error went away, the tests pass and all seems fine but I feel like there was a reason for it being Cuint and that I broke something by doing that:-).

bicycle1885 commented 7 years ago

Thank you for reporting.

Changing the avail_in field type would be dangerous. The type of avail_in is uInt, which is an alias of unsigned int in zlib (https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zconf.h#L393), and the sizes of uInt and uLong are often different (at least on my machine).

I guess the problem happened because (1) there is a bug somewhere and BufferedStreams.available_bytes returned a wrong value that didn't fit in Cuint, or (2) you (or some package that used Libz.jl) allocated too much buffer for it. Anyway, I need to reproduce the problem before debugging. So, can you give me more details (versioninfo(), package versions, etc.) so that I can reproduce the same problem?

pwl commented 7 years ago
julia> map(Pkg.installed,["Libz","BufferedStreams"])
2-element Array{VersionNumber,1}:
 v"0.2.0+"
 v"0.3.2" 

julia> map(x->(x,Pkg.installed(x)),["Libz","BufferedStreams"])
2-element Array{Tuple{String,VersionNumber},1}:
 ("Libz",v"0.2.0+")          
 ("BufferedStreams",v"0.3.2")

julia> Libz.version()
"1.2.11"

julia> versioninfo()
Julia Version 0.5.1
Commit 6445c82* (2017-03-05 13:25 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (NO_AFFINITY HASWELL)
  LAPACK: liblapack
  LIBM: libm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

Let me know if you need additional version info.

I have just used the ZlibInflateInputStream directly on the gzipped file without any tinkering with buffer sizes: ZlibInflateInputStream(open(f,"r")), where f is the big archive.

timbitz commented 7 years ago

I have become aware of a similar error that a colleague received when reading a gzipped FASTQ file from Bio.jl:

ERROR: LoadError: InexactError()
 in process(::Libz.Source{:inflate,BufferedStreams.BufferedInputStream{IOStream}}, ::Int32) at ~/.julia/v0.5/Libz/src/source.jl:158
 in readbytes!(::Libz.Source{:inflate,BufferedStreams.BufferedInputStream{IOStream}}, ::Array{UInt8,1}, ::Int64, ::Int64) at ~/.julia/v0.5/Libz/src/source.jl:142
 in fillbuffer!(::BufferedStreams.BufferedInputStream{Libz.Source{:inflate,BufferedStreams.BufferedInputStream{IOStream}}}) at ~/.julia/v0.5/BufferedStreams/src/bufferedinputstream.jl:70
 in read!(::Bio.Seq.FASTQReader{Bio.Seq.BioSequence{Bio.Seq.DNAAlphabet{2}}}, ::Bio.Seq.SeqRecord{Bio.Seq.BioSequence{Bio.Seq.DNAAlphabet{2}},Bio.Seq.FASTQMetadata}) at ~/.julia/v0.5/Bio/src/Ragel.jl:188

This hasn't happened on my machine so I will try to get more specifics about his environment.

bicycle1885 commented 7 years ago

The issue is reproducible on my machine, so I'll fix it as soon as possible. JFYI, I'm now working on new packages (see https://github.com/bicycle1885/TranscodingStreams.jl) that will replace BufferedStreams.jl and Libz.jl, and the issue does not happen on them. However, these are not yet ready to use.

timbitz commented 7 years ago

Awesome, a bugfix for this would be great @bicycle1885! Actually I did some asking around and a different colleague showed me the same error, so I'm not sure why I haven't reproduced it yet. Anyways, your TranscodingStreams.jl package looks like a really good replacement once its ready!

bicycle1885 commented 7 years ago

Hi, I've fixed an issue found in BufferedStreams.jl. Please update the package and check it on your machine once https://github.com/JuliaLang/METADATA.jl/pull/9654 is merged.

timbitz commented 7 years ago

It looks like its fixed! Thanks @bicycle1885! I will double check with those colleagues again tomorrow just to be sure. Cheers!