Open timoxley opened 7 years ago
in #28 @Yomguithereal asks:
The question I am pondering before implementing this is whether this should be an instance or a static method & if this is an instance method, what should it do if the structure has already been fed some data? We just add serialized data? We clear then add serialized data?
My suggestion is make the thing static and only for creating new instances, like .from
. Figuring out how to diff/union with existing data would be fantastically useful, but is perhaps a separate issue. The problem at hand is that there's currently a high cost + custom code required to utilize mnemonist structures outside of the current process e.g. restoring from disk/db or sending over network.
In a first time I think I will go will go with a symmetric static .fromJSON
method that should address most of the cases (Bloom filters may very well be serialized to JSON as an array, even if this is more costly than its Byte array representation counterpart). I will let the serialize
etc. open because, as you said, it leaves the possibility for more complex and efficient serialization strategies.
What we can do, as starter, here, is to make a list of the different structures and see 1) can a .fromJSON work & 2) can we imagine better serialization schemes.
RadixTree
before this.@timoxley @GeoffreyPlitt what's your opinion on this?
Sounds like a plan 🍾
I will therefore modify some of the existing toJSON
methods to take this into account.
Concerning the BKTree
etc. I guess the signature will be the following:
BKTree.fromJSON(distance, json);
Currently one can serialize mnemonist data-structures with
.toJSON
but there does not appear to be a standard way to deserialize. I'd like to be able to cache mnemonist structures in the browser or to send pre-processed structures to the client over a network.To achieve serialization/deserialization at the moment, one has to write custom functions which often need to re-iterate over the entire data set. Re-iterating may be prohibitively expensive for large structures i.e. this sucks most for the exact use-cases where mnemonist would be most useful.
e.g. there should be a way to do something like:
and ideally the deserialization process would need to do minimal reprocessing, it would basically just dump the data into place, something like:
For example, I'd hoped this exact thing would work for
Trie
, except thattoJSON
loses thesize
information, and if you added.size
to the structure produced bytoJSON
, you'd potentially break any 3rd party code consuming the currenttoJSON
format.Therefore, you should probably should use something other than
toJSON
, instead create a new API pair e.g.serialize
/deserialize
which produces/consumes a representation whose structure users would consider opaque because:toJSON
API.Perhaps
serialize
would just generate JSON or a JS object for now, but you don't want to be locked into that, nor into the structure it produces.Related to #28