JuliaIO / BSON.jl

Other
158 stars 39 forks source link

Split out Julia serialiser. #58

Open MikeInnes opened 4 years ago

MikeInnes commented 4 years ago

This package is poorly designed (it's ok, I'm allowed to say that) in that it tries to support both plain BSON and Julia data structures in one go. This creates annoying corner cases (do we preserve the type tag in Dict{String,Int} or store it in a compatible way?) and also makes it hard to safely load BSON anyway, because of the obvious security issues around loading Julia data (#50).

BSON.jl should strip out all serialisation utilities and just have a solid reader/writer. Other formats (model weights, Julia data) should be defined on top in separate packages. The new package could be a fork of BSON.jl, or we could just recommend JLSO, which is built on BSON anyway.

I don't know if JLSO supports closures or cycles, which are important to Flux, but it certainly has other nice features, and if its maintainers are open to those things it'd be my preferred option.

oxinabox commented 4 years ago

This is the bit of the docs to read to understand JLSO https://invenia.github.io/JLSO.jl/dev/api/#JLSO.JLSO

It is basically: BSON at the top, but only plain BSON not Julia dependent (in theory). includes metadata, which should at some point be just a Project.toml and a Manifest.toml Then per object it is compressed Julia Serializer, or BSON.jl data. the compression and format (serializer vs BSON) can be configured.

JLSO depends on the full capacity of BSON.jl (with its julia specific extensions) for arbitary serialization if in BSON mode. and on its plain BSON capacity for the top level.

MikeInnes commented 4 years ago

I didn't initially realise that JLSO was reusing BSON.jl's serialisation (optionally, of course, with Julia's serialiser as an alternative).

That makes things easier since in the right mode JLSO supports BSON.jl's current semantics and backwards compatibility semantics. Perhaps we should just move the serialisation stuff to that repo and then clean it up a bunch.

rofinn commented 4 years ago

I don't know if JLSO supports closures or cycles, which are important to Flux, but it certainly has other nice features, and if its maintainers are open to those things it'd be my preferred option.

The gist is that if julia can serialize it then JLSO can save it.

BSON at the top, but only plain BSON not Julia dependent (in theory). includes metadata, which should at some point be just a Project.toml and a Manifest.toml

Yeah, we should probably do that sooner rather than later. I'd vote to save the Project.toml uncompressed for safety and the Manifest.toml compressed.

NOTE: If we move the julia specific extension over to JLSO.jl then we could also handle https://github.com/JuliaIO/BSON.jl/issues/20 there as well.

ToucheSir commented 2 years ago

Now that https://github.com/ancapdev/LightBSON.jl exists, I'd vote for revisiting this and seeing if we can't merge the two packages back together by extracting out all the flaky Julia serialization bits :)