awslabs / gluonts

Probabilistic time series modeling in Python
https://ts.gluon.ai
Apache License 2.0
4.58k stars 750 forks source link

Serde 2.0 #862

Open jaheba opened 4 years ago

jaheba commented 4 years ago

I propose to re-work how we currently serialise and deserialise objects.

Our current design aims at solving these two issues:

For example:

class MyClass:
    @validated()
    def __init__(self, a: int, b: int):
        self.a = a
        self.b = b

# parameters are validated,
# such that passing invalid arguments raises an error
instance = MyClass(a=1, b=2)

# We can simply dump and load objects.
# It's a bit like pickle, but is aimed at producing json output
from gluonts.core import serde
from gluonts.core.component import equals

assert equals(instance, serde.decode(serde.encode(instance)))

There are two problems in my view:

pickle API

We currently use pickle's __getnewargs_ex__ to encode objects:

    if hasattr(v, "__getnewargs_ex__"):
        args, kwargs = v.__getnewargs_ex__()  # mypy: ignore
        return {
            "__kind__": kind_inst,
            "class": fqname_for(v.__class__),
            "args": encode(args),
            "kwargs": encode(kwargs),
        }

I'm unsure if this is good, since we assume that we can also encode an object's parameters, which might not always be the case. Put differently, implementing __getnewargs_ex__ promisies that the returned arguments can be pickled, but not that they can be encoded. If we have our own interface, that contract is more explicit.

Stateful Objects

We currently don't have a way to encode custom stateful objects.

Looking at pickle, it uses the contents of an objects __dict__ to store the state by default.

Pickle also has __getstate__ and __setstate__ methods, which offer more control:

class MyClass:
    def __init__(self, upper):
        self.number = random.randrange(1, upper)

    def __getstate__(self):
        return self.__dict__

    def __setstate__(self, state):
        self.__dict__ = state

Proposal

Protocol

I think we should add bevaviour similar to __getstate__/__setstate__.

Further, I think we should use custom method-names to be more explicit and avoid confusion with pickle.

Better Interface

The main reason validated exists that way is that object initialisation is often more complicated than just assigning the arguments. For example, when some arguments depend on others:

class MyEstimator:
    @validated()
    def __init__(
        self,
        prediction_length: int,
        context_length: Optional[int]
    ):
        self.prediction_length = prediction_length
        if context_length is None:
            self.context_length = 2 * prediction_length
        else:
            self.context_length = context_length

Instead we could use something like this:

class MyEstimator(SerdeObject):
    prediction_length: int
    context_length: Optional[int] = None

    __defaults__ = {
        "context_length": lambda self: 2 * self.prediction_length,
    }
vafl commented 4 years ago

Thanks @jaheba.

I agree that the current mechanism is a bit strange, however, it does seem to work in practice and as a user I only have to know that I have to add @validated and things magically work.

Could you give a few more examples for how this would look like for the user with you proposed change?

One thing we should support is serialization/deserialization of classes that are/contain gluon HybridBlocks and their weights.