Closed bhalonen closed 5 years ago
A stack overflow reader identified the issue as https://github.com/fhs/NPZ.jl/blob/master/src/NPZ.jl#L219 You are not touching that line, this was not the problem then or was that a second problem (the line has been rewritten since.)
@mschauer That line -is- the slow line, but it is slow (and so slow) because it was accessing the disk. I was trying to optimize that line, but I got a 25x by just reading the zip into a buffer first.
I'm wondering if it would be better to make the read into buffer change in the ZipFile package.
Change this line instead. https://github.com/fhs/ZipFile.jl/blob/998d334256e863c0e1600704c3d654016c178ef1/src/ZipFile.jl#L118
@fhs what do you think?
@bhalonen It's not correct to read the whole zip file into memory. The library should support zip files that may be bigger than available memory.
I thought I/O was already buffered by the standard library, but I'm guessing it's not? Maybe we need to use a 3rd party library this one: https://github.com/BioJulia/BufferedStreams.jl ?
@fhs I'll look into it.
Fixes #19 and #27
Compare the current master NPZ Fix for 19 self explanatory, test included. Use my new version of NPZ to generate a test file
Old version:
To my change: