fhs / NPZ.jl

A Julia package that provides support for reading and writing Numpy .npy and .npz files
Other
117 stars 16 forks source link

Large speed increase and fixing writing dicts #28

Closed bhalonen closed 5 years ago

bhalonen commented 5 years ago

Fixes #19 and #27

Compare the current master NPZ Fix for 19 self explanatory, test included. Use my new version of NPZ to generate a test file

using NPZ
my_dict=Dict("a"=>rand(Float32,512,512,16), "b"=>rand(Float32,512,512,1))
npzwrite("temp.npz",my_dict)

Old version:

julia> @time npzread("temp.npz")
109.773750 seconds (143.91 M allocations: 5.182 GiB, 0.32% gc time)

To my change:

julia> @time npzread("temp.npz")
  5.346293 seconds (161.75 M allocations: 5.464 GiB, 5.76% gc time)
mschauer commented 5 years ago

A stack overflow reader identified the issue as https://github.com/fhs/NPZ.jl/blob/master/src/NPZ.jl#L219 You are not touching that line, this was not the problem then or was that a second problem (the line has been rewritten since.)

bhalonen commented 5 years ago

@mschauer That line -is- the slow line, but it is slow (and so slow) because it was accessing the disk. I was trying to optimize that line, but I got a 25x by just reading the zip into a buffer first.

bhalonen commented 5 years ago

I'm wondering if it would be better to make the read into buffer change in the ZipFile package.

bhalonen commented 5 years ago

Change this line instead. https://github.com/fhs/ZipFile.jl/blob/998d334256e863c0e1600704c3d654016c178ef1/src/ZipFile.jl#L118

@fhs what do you think?

fhs commented 5 years ago

@bhalonen It's not correct to read the whole zip file into memory. The library should support zip files that may be bigger than available memory.

I thought I/O was already buffered by the standard library, but I'm guessing it's not? Maybe we need to use a 3rd party library this one: https://github.com/BioJulia/BufferedStreams.jl ?

bhalonen commented 5 years ago

@fhs I'll look into it.