Yomguithereal / mnemonist

Curated collection of data structures for the JavaScript/TypeScript language.
https://yomguithereal.github.io/mnemonist
MIT License
2.25k stars 92 forks source link

Standard serialization/deserialization API across all data-structures #37

Open timoxley opened 7 years ago

timoxley commented 7 years ago

Currently one can serialize mnemonist data-structures with .toJSON but there does not appear to be a standard way to deserialize. I'd like to be able to cache mnemonist structures in the browser or to send pre-processed structures to the client over a network.

To achieve serialization/deserialization at the moment, one has to write custom functions which often need to re-iterate over the entire data set. Re-iterating may be prohibitively expensive for large structures i.e. this sucks most for the exact use-cases where mnemonist would be most useful.

e.g. there should be a way to do something like:

dest.fromJSON(src.toJSON())

and ideally the deserialization process would need to do minimal reprocessing, it would basically just dump the data into place, something like:

dest.root = src.toJSON()

For example, I'd hoped this exact thing would work for Trie, except that toJSON loses the size information, and if you added .size to the structure produced by toJSON, you'd potentially break any 3rd party code consuming the current toJSON format.

Therefore, you should probably should use something other than toJSON, instead create a new API pair e.g. serialize/deserialize which produces/consumes a representation whose structure users would consider opaque because:

Perhaps serialize would just generate JSON or a JS object for now, but you don't want to be locked into that, nor into the structure it produces.


Related to #28

timoxley commented 7 years ago

in #28 @Yomguithereal asks:

The question I am pondering before implementing this is whether this should be an instance or a static method & if this is an instance method, what should it do if the structure has already been fed some data? We just add serialized data? We clear then add serialized data?

My suggestion is make the thing static and only for creating new instances, like .from. Figuring out how to diff/union with existing data would be fantastically useful, but is perhaps a separate issue. The problem at hand is that there's currently a high cost + custom code required to utilize mnemonist structures outside of the current process e.g. restoring from disk/db or sending over network.

Yomguithereal commented 7 years ago

In a first time I think I will go will go with a symmetric static .fromJSON method that should address most of the cases (Bloom filters may very well be serialized to JSON as an array, even if this is more costly than its Byte array representation counterpart). I will let the serialize etc. open because, as you said, it leaves the possibility for more complex and efficient serialization strategies.

What we can do, as starter, here, is to make a list of the different structures and see 1) can a .fromJSON work & 2) can we imagine better serialization schemes.

Yomguithereal commented 7 years ago
Yomguithereal commented 7 years ago

@timoxley @GeoffreyPlitt what's your opinion on this?

timoxley commented 7 years ago

Sounds like a plan 🍾

Yomguithereal commented 7 years ago

I will therefore modify some of the existing toJSON methods to take this into account.

Concerning the BKTree etc. I guess the signature will be the following:

BKTree.fromJSON(distance, json);