JuliaIO / JLD.jl

Saving and loading julia variables while preserving native types
MIT License
277 stars 55 forks source link

Support in memory Saving. `Stream(format"JLD",IOBuffer())` #89

Open oxinabox opened 8 years ago

oxinabox commented 8 years ago

It would be really nice if JLD could support Saving into an IOBuffer. FileIO suggests that the code bellow would be how it is done. However that code just hangs

using FileIO
using JLD
ss = Stream(format"JLD",IOBuffer())
JLD.save(ss, Dict("ii"=> 55555))

my particular use case is that I would like to save my data directly into a OpenStack Swift Object storage, without writing it to disk. because right now the work around I am looking at is

1 . in memory data

  1. Write JLD to disk
  2. Read JLD file into IOBuffer()/Vector{UInt8}
  3. Upload to Object storage

where as it could be:

1 . in memory data

  1. Write as JLD format into `IOBuffer()
  2. Upload to Object storage

This cuts out the Reading and Writing from disk. Which is a big speed-up for multi-gigabyte files.


Failing adding support for writing to Streams, it would be nice if rather than handing it threw a method error.


This is with:

Julia Version 0.5.0-rc0+150 Commit 389dc1c (2016-08-03 04:22 UTC) Platform Info: System: Linux (x86_64-linux-gnu) CPU: AMD Opteron 63xx class CPU WORD_SIZE: 64 BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NOAFFINITY Piledriver) LAPACK: libopenblas64 LIBM: libopenlibm LLVM: libLLVM-3.7.1 (ORCJIT, bdver2)

simonster commented 8 years ago

This would most likely need to be implemented in a special way, since libhdf5 wants a real file and not an IOBuffer. libhdf5 has in-memory file support that could be used for this, but if you're on Linux, the easier approach to avoiding disk IO may be to put the temporary file in /dev/shm.

With that said, the code you are trying to run should not fail so spectacularly. On Julia 0.4 with LLVM assertions, I get:

julia> using FileIO

julia> using JLD

julia> ss = Stream(format"JLD",IOBuffer())
FileIO.Stream{FileIO.DataFormat{:JLD},Base.AbstractIOBuffer{Array{UInt8,1}}}(IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1),Nullable{UTF8String}())

julia> JLD.save(ss, Dict("ii"=> 55555))
Assertion failed: (!isAlreadyCodeGenerating && "Error: Recursive compilation detected!"), function runJITOnFunctionUnlocked, file /usr/local/julia-release-0.4/deps/llvm-3.3/lib/ExecutionEngine/JIT/JIT.cpp, line 623.

On Julia 0.5, it crashes and then the unwinder crashes. Judging by what I see when I disable the unwinder and use lldb to produce a backtrace, I would guess it's a stack overflow.

oxinabox commented 8 years ago

Thanks. This is what i ended up with:

function put_jld(conn, container::String, name::String; data...)

    fn = joinpath(mktempdir("/dev/shm"), name)
        #To make compatible with non-unix, remove the /dev/shm and let the temp file be wherever 

    save(File(format"JLD", fn), Base.Flatten(((string(name), val) for (name,val) in data))...)

    open(fn, "r") do fp
        etag = conn.put_object(container, name, contents = fp, content_type = "application/x-hdf5");
    end

end

function get_jld(conn, container::String, name::String, data...)

    fn = joinpath(mktempdir("/dev/shm"), name)
        #To make compatible with non-unix, remove the /dev/shm and let the temp file be wherever 

    open(fn, "w") do fp
        response_headers, response_data =  conn.get_object(container, name);
        write(fp, response_data)
    end

    JLD.load(File(format"JLD", fn), data...)
end

Which is actually quiet fine, because Swift also rather likes to get a "File-like-object" as an input. (Though also accepts strings)