ASDF serialization - Githubissues

dwavesystems / dimod

A shared API for QUBO/Ising samplers.

https://docs.ocean.dwavesys.com/en/stable/docs_dimod/

Apache License 2.0

120 stars 79 forks source link

ASDF serialization #513

Open randomir opened 5 years ago

randomir commented 5 years ago

ASDF - Advanced Scientific Data Format looks promising for serialization in Ocean (BQM/SampleSet in dimod's case).

It has the following features:

A hierarchical, human-readable metadata format (implemented using YAML)
Numerical arrays are stored as binary data blocks which can be memory mapped. Data blocks can optionally be compressed.
The structure of the data can be automatically validated using schemas (implemented using JSON Schema)
Native Python data types (numerical types, strings, dicts, lists) are serialized automatically
ASDF can be extended to serialize custom data types

Libraries for python and c++ are available.

randomir commented 5 years ago

The point here being -- instead of (re)inventing custom serialization schemas, we should use something like this that works with NumPy out of the box, it's fast, supports other custom datatypes through extensions and has libraries for languages we use.

One downside is it has libraries for only Python and C++. But at least the standard is backed by a rather large org.

arcondello commented 5 years ago

This is pretty cool!

Would the idea be to make asdf a dependency of dimod, e.g.

BQM.to_asdf()

or to just dump to an asdf compatible format e.g.

af = asdf.AsdfFile(BQM.to_serializable())

randomir commented 4 years ago

No, actually the idea was for dimod objects to expose to_dict method which would return a "tree" (in Asdf terminology), i.e. a dict with NumPy objects in it. Dual method, from_dict, would accept the same.

Serialization would be handled on a lower level, closer "to wire". And Asdf would be used there only.

(I've been advocating this approach for serialization since we first talked about it, and I guess to_serializable comes quite close. Notable distinction is: it still goes through additional effort of serializing ndarray to list/bytes. With Asdf, dimod doesn't have to get its hands dirty with serialization of "standard" data types; where I consider numpy.ndarray becoming increasingly "standard" in modern Python).