JuliaIO / BSON.jl

Other
157 stars 39 forks source link

Add README message to point users to Serialization stdlib, JLD2, and LightBSON #122

Open ericphanson opened 1 year ago

ericphanson commented 1 year ago

BSON.jl has many serious bugs, and despite targeting a generic format (BSON), the way it uses Julia internals to serialize arbitrary objects is not very robust. I think we should add a big README warning pointing new users to the Serialization stdlib (for serializing arbitrary objects & code), JLD2 (for arbitrary objects), and LightBSON (for serializing some objects to BSON, e.g. via the StructTypes API).

darsnack commented 1 year ago

Agreed. Perhaps Flux docs should start pointing users to these libraries too? The most stable way to save a Flux model right now is to save the named tuple representation instead of the struct anyways.

I know Legolas has its own system for saving/loading. Do you have a recommendation on which alternate library has worked best for you?

ericphanson commented 1 year ago

I've been using LegolasFlux which uses Arrow to serialize the weights (incl non-trainable parameters like BatchNorm means/variances). It is a pretty light package, just traversing through the model to collect the arrays, then flattening them (since Arrow doesn't do multi-dim arrays well), and storing the sizes separately, and then reversing the process to deserialize. It does not keep track of the code/layers, and suggests storing an architecture_version::Int parameter with the weights (so you could keep multiple versions in your codebase and then load the weights into the appropriate one).

That has been working reliably, but may not be the best solution for everyone, since it means you have to keep track of the code yourself somewhat carefully, rather than storing it in the serialized artifact.

Before that, we used Serialization, which works well unless the layout of one of the constitutient structures changes (e.g. fields move around in a layer, e.g. since you changed Flux versions), which can totally break deserialization. (This can be an issue for LegolasFlux as well, but you can still pull the weights out of the arrow table without an issue, but then to load them into the new structure maybe you need to rearrange them or something - still not great). Also, Serialization has trouble with anonymous functions and you need to ensure your model doesn't have any before serializing.

Haven't tried any other options.

ToucheSir commented 1 year ago

To my knowledge BSON has all the same limitations as Serialization here, so the quickest change we could make is substituting it in the Flux docs. More tricky is updating Metalhead weights, though we could bundle that into the v0.8 release. Landing https://github.com/FluxML/Functors.jl/pull/56 would make using different serialization libraries (including Legolas) much easier, but I'm partially responsible for holding it up with some bikeshedding, apologies.

darsnack commented 1 year ago

One thing we can do on the Flux side is to make clear that there are two options for storing models:

  1. The code prior to loading will define the types, so you just store the named tuple format. Then Flux.loadmodel! should allow you to handle cases where the struct definition changed but a majority of the fields are the same. This option is preferred, because it is the most robust to version changes. Any serialization library can be used (but we won't advertise BSON.jl).
  2. The code does not define the types, so the serialized format should store everything. This is not robust to version changes that redefine structs, but it is the most reliable in a production-like setting where you load and run.