Fatal1ty / mashumaro

Fast and well tested serialization library
Apache License 2.0
751 stars 44 forks source link

Unserializable field in 3.12 if defined as a Generic TypeVar with mixin bounds #195

Closed stamostz closed 5 months ago

stamostz commented 5 months ago

Description

I am upgrading mashumaro from 3.0.4 to the latest version for pydantic plus other features compatibility. This should be pretty straightforward but it seems like the latest version has some bug around Generic type. Specifically, my fields in 3.12 are unserializable if defined as a Generic TypeVar with mixin bounds

What I Did

I am wrapping any kind of data in a wrapper class. I want it to be serializable and the data I'm wrapping to also be serializable:

from typing import TypeVar, Union, Generic, Optional
from mashumaro.mixins.msgpack import DataClassMessagePackMixin
from mashumaro.mixins.json import DataClassJSONMixin
from dataclasses import dataclass

T = TypeVar("T", bound=Union[DataClassMessagePackMixin, DataClassJSONMixin])

@dataclass
class WrapperCls(Generic[T], DataClassMessagePackMixin, DataClassJSONMixin):
    data: Optional[T] = None

This code works properly on mashumaro 3.0.4 but after I upgraded to mashumaro 3.12 I get the following error:

    raise UnserializableField(
mashumaro.exceptions.UnserializableField: Field "data" of type DataClassMessagePackMixin in WrapperCls is not serializable

The message is also very odd as DataClassMessagePackMixin is exactly the serializable part of my variable (?)

Please let me know if I'm not understanding something properly, if specs of the library have changed or if this is some bug. From what I tried, this issue was introduced in version 3.9

Fatal1ty commented 5 months ago

Hi @stamostz!

This code works properly on mashumaro 3.0.4

It works indeed, but not in the way that can be expected. I installed 3.0.4 (the same will be on 3.8.1) and enabled debug to see what happens in your example. Basically, for field data in dataclass WrappedCls the following method is generated:

__main__.WrapperCls:
@classmethod
def __unpack_union_WrapperCls_data__8f8113bcd6294acd898493f0d316a6f8(cls, value, *, dialect=None):
    try:
        return mashumaro_mixins_msgpack_DataClassMessagePackMixin.from_dict(value)
    except Exception: pass
    try:
        return mashumaro_mixins_json_DataClassJSONMixin.from_dict(value)
    except Exception: pass
    raise InvalidFieldValue('data',typing.Union[mashumaro.mixins.msgpack.DataClassMessagePackMixin, mashumaro.mixins.json.DataClassJSONMixin],value,cls)

The problematic expressions here are mashumaro_mixins_msgpack_DataClassMessagePackMixin.from_dict(value) and mashumaro_mixins_json_DataClassJSONMixin.from_dict(value). They work with no doubts but creating an instance of DataClassMessagePackMixin or DataClassJSONMixin is meaningless. This was addressed and changed in:

I am wrapping any kind of data in a wrapper class. I want it to be serializable and the data I'm wrapping to also be serializable

Good news for you! Starting with version 3.9, you no longer need to inherit mixins for nested dataclasses. Unfortunately, it would be problematic to express any type serializable with mashumaro, so I suggest you just drop bound in your TypeVar:

from datetime import date
from typing import TypeVar, Generic, Optional, List
from mashumaro.mixins.msgpack import DataClassMessagePackMixin
from mashumaro.mixins.json import DataClassJSONMixin
from dataclasses import dataclass

T = TypeVar("T")

@dataclass
class WrapperCls(Generic[T], DataClassMessagePackMixin, DataClassJSONMixin):
    data: Optional[T] = None

@dataclass
class Foo:
    x: date

@dataclass
class FooWrapperCls(WrapperCls[Foo]):
    pass

@dataclass
class DateWrapperCls(WrapperCls[date]):
    pass

@dataclass
class ListOfDatesWrapperCls(WrapperCls[List[date]]):
    pass

assert FooWrapperCls.from_json('{"data": {"x": "2024-03-01"}}').data.x == date(2024, 3, 1)
assert DateWrapperCls.from_json('{"data": "2024-03-01"}').data == date(2024, 3, 1)
assert ListOfDatesWrapperCls.from_json('{"data": ["2024-03-01"]}').data[0] == date(2024, 3, 1)
stamostz commented 5 months ago

Thank @Fatal1ty for the elaborate response.

I can indeed drop bound=Union[DataClassMessagePackMixin, DataClassJSONMixin] and the error is resolved. However, is my TypeVar T now type-safe? From your quote and the appended issue, I understand it is type-safe.

Good news for you! Starting with version 3.9, you no longer need to inherit mixins for nested dataclasses.

However, there are cases I need the nested dataclasses to be mixins. Here's an example:

from dataclasses import dataclass
from mashumaro.mixins.msgpack import DataClassMessagePackMixin
from mashumaro.mixins.json import DataClassJSONMixin
from typing import Dict, Any

@dataclass
class ConfigMessage(DataClassMessagePackMixin, DataClassJSONMixin):
    config: Dict[str, Any]

In this scenario, my WrapperCls is sent over some network with the data field (along with others, metadata, etc) which in this case is a ConfigMessage. Then, in another software block, I want to serialize/deserialize my ConfigMessage for other reasons/usecases. So I need to able to receive data and have it be serializable.

I did some digging and here's what is odd: My WrapperCls is mashumaro serializable for basic examples. I can confirm it with a console:

 In [8]: ResultResponse(data="foo").to_msgpack()
Out[8]: b'\x82\xa4data\xa3foo\xa5error\xc0'

In [9]: ResultResponse(data=4).to_msgpack()
Out[9]: b'\x82\xa4data\x04\xa5error\xc0'

In [10]: ResultResponse(data={"foo": "bar"}).to_msgpack()
Out[10]: b'\x82\xa4data\x81\xa3foo\xa3bar\xa5error\xc0'

My ConfigMessage class, also is mashumaro serializable:

 In [11]: ConfigMessage(config={"foo": "bar"}).to_msgpack()
Out[11]: b'\x81\xa6config\x81\xa3foo\xa3bar'

But, the combination of the two, now isn't:

In [12]: WrapperCls(data=ConfigMessage(config={"foo":"bar"})).to_msgpack()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 WrapperCls(data=ConfigMessage(config={"foo":"bar"})).to_msgpack()

File <string>:9, in __mashumaro_to_msgpack__(self, encoder)

File /opt/conda/lib/python3.8/site-packages/mashumaro/mixins/msgpack.py:29, in default_encoder(data)
     28 def default_encoder(data: Any) -> EncodedData:
---> 29     return msgpack.packb(data, use_bin_type=True)

File /opt/conda/lib/python3.8/site-packages/msgpack/__init__.py:36, in packb(o, **kwargs)
     30 def packb(o, **kwargs):
     31     """
     32     Pack object `o` and return packed bytes
     33 
     34     See :class:`Packer` for options.
     35     """
---> 36     return Packer(**kwargs).pack(o)

File msgpack/_packer.pyx:294, in msgpack._cmsgpack.Packer.pack()

File msgpack/_packer.pyx:300, in msgpack._cmsgpack.Packer.pack()

File msgpack/_packer.pyx:297, in msgpack._cmsgpack.Packer.pack()

File msgpack/_packer.pyx:231, in msgpack._cmsgpack.Packer._pack()

File msgpack/_packer.pyx:291, in msgpack._cmsgpack.Packer._pack()

TypeError: can not serialize 'ConfigMessage' object

What am I not understanding properly?

Fatal1ty commented 5 months ago

But, the combination of the two, now isn't:

In [12]: WrapperCls(data=ConfigMessage(config={"foo":"bar"})).to_msgpack()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 WrapperCls(data=ConfigMessage(config={"foo":"bar"})).to_msgpack()

File <string>:9, in __mashumaro_to_msgpack__(self, encoder)

File /opt/conda/lib/python3.8/site-packages/mashumaro/mixins/msgpack.py:29, in default_encoder(data)
     28 def default_encoder(data: Any) -> EncodedData:
---> 29     return msgpack.packb(data, use_bin_type=True)

File /opt/conda/lib/python3.8/site-packages/msgpack/__init__.py:36, in packb(o, **kwargs)
     30 def packb(o, **kwargs):
     31     """
     32     Pack object `o` and return packed bytes
     33 
     34     See :class:`Packer` for options.
     35     """
---> 36     return Packer(**kwargs).pack(o)

File msgpack/_packer.pyx:294, in msgpack._cmsgpack.Packer.pack()

File msgpack/_packer.pyx:300, in msgpack._cmsgpack.Packer.pack()

File msgpack/_packer.pyx:297, in msgpack._cmsgpack.Packer.pack()

File msgpack/_packer.pyx:231, in msgpack._cmsgpack.Packer._pack()

File msgpack/_packer.pyx:291, in msgpack._cmsgpack.Packer._pack()

TypeError: can not serialize 'ConfigMessage' object

When you call WrapperCls(…).to_msgpack() you call the method to_msgpack compiled for unbound TypeVar which is interpreted as Any and you will get any data passed as is in the end. There is no type introspection at runtime. Moreover, it’s not possible to deserialize something unknown to WrapperCls instance because T can be anything. You either need to create a specific dataclass model with T specified (as I showed in the previous message for FooWrapperCls), or create an encoder for specific data type:

from mashumaro.codecs.msgpack import MessagePackEncoder

config_message_encoder = MessagePackEncoder(WrapperCls[ConfigMessage])
config_message_encoder.encode(WrapperCls(ConfigMessage(config={"foo":"bar"})))

See my answer in the similar issue: https://github.com/Fatal1ty/mashumaro/issues/193#issuecomment-1954059443

You can also play with discriminators, hooks, custom serialization strategies, since only you know better how to distinguish all variety of your data. Just as a silly example, it’s possible to achieve runtime introspection on your side using (de)serialization methods. However, if you care about performance, I encourage you to create specific dataclass models / encoders / decoders instead.

def serialize_t(value):
    if isinstance(value, DataClassDictMixin):
        return value.to_dict()
    else:
        raise NotImplementedError

@dataclass
class WrapperCls(Generic[T], DataClassMessagePackMixin):
    data: T = field(metadata=field_options(serialize=serialize_t))
stamostz commented 5 months ago

Spot on, this worked like a charm.

Finally, I'm overriding from_json and from_msgpack to include the type in my WrapperCls like this:

def serialize_t(value):
    if value is None:
        return None
    if isinstance(value, DataClassDictMixin):
        return value.to_dict()
    else:
        raise NotImplementedError

@dataclass
class WrapperCls(Generic[T], DataClassMessagePackMixin, DataClassJSONMixin):
    data: Optional[T] = field(metadata=field_options(serialize=serialize_t))

    @classmethod
    def from_json(
        cls: "WrapperCls",
        data: EncodedData,
        type: Type[T],
        decoder: Decoder = json.loads,
        **from_dict_kwargs,
    ) -> "WrapperCls[T]":
        decoded_data = decoder(data)

        data_part_maybe = decoded_data.get("data")
        if data_part_maybe is not None:
            # give mashumaro type hint
            return WrapperCls(
                data=type.from_dict(data_part_maybe, **from_dict_kwargs)
            )

        return cls.from_dict(decoder(data), **from_dict_kwargs)

    @classmethod
    def from_msgpack(
        cls: "WrapperCls",
        data: EncodedData,
        type: Type[T],
        decoder: Decoder = default_decoder,
        **from_dict_kwargs,
    ) -> "WrapperCls[T]":
        decoded_data = decoder(data)

        data_part_maybe = decoded_data.get("data")
        if data_part_maybe is not None:
            # give mashumaro type hint
            return WrapperCls(
                data=type.from_dict(
                    data_part_maybe,
                    **dict({"dialect": MessagePackDialect}, **from_dict_kwargs),
                )
            )

        return cls.from_dict(
            decoder(data), **dict({"dialect": MessagePackDialect}, **from_dict_kwargs)
        )

See the examples:

In [5]: j = WrapperCls(data=ConfigMessage(config={"foo":"bar"})).to_json()

In [6]: WrapperCls.from_json(j, ConfigMessage)
Out[6]: ResultResponse(data=ConfigMessage(config={'foo': 'bar'}))

Cheers, this works nicely.

However, when I do the same with from_msgpack:

In [7]: m = WrapperCls(data=ConfigMessage(config={"foo":"bar"})).to_msgpack()

In [8]: WrapperCls.from_msgpack(m, ConfigMessage)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 WrapperCls.from_msgpack(m, ConfigMessage)

File <string>:18, in __mashumaro_from_msgpack__(cls, d, decoder, dialect)

ValueError: Argument for WrapperCls.__mashumaro_from_msgpack__ method should be a dict instance

That's weird, inspecting a bit deeper, we can see that the instance is indeed a dict. That makes me think that the method is not really overriden but some default implementation is taken. We can confirm that by renaming our method:

    @classmethod
    def from_msgpack_typed(
        cls: "WrapperCls",
        data: EncodedData,
        type: Type[T],
        decoder: Decoder = default_decoder,
        **from_dict_kwargs,
    ) -> "WrapperCls[T]":
        ...

Now, the same example:

In [3]: m = WrapperCls(data=ConfigMessage(config={"foo":"bar"})).to_msgpack()

In [4]: WrapperCls.from_msgpack_typed(m, ConfigMessage)
Out[4]: WrapperCls(data=ConfigMessage(config={'foo': 'bar'}))

Yay! This works. Of course, it's the lesser evil but I can rename all from_msgpack instances in my codebase to from_msgpack_typed but really the question here is: Why isn't from_msgpack being overriden?

Fatal1ty commented 5 months ago

Yay! This works. Of course, it's the lesser evil but I can rename all from_msgpack instances in my codebase to from_msgpack_typed but really the question here is: Why isn't from_msgpack being overriden?

Because this is how mixins work. There is no point in using mixins that provide these methods if you define them by yourself. Again, if you need some extra work during (de)serialization, you can use pre/post hooks.

Addition: There is implementation difference between DataClassJSONMixin and DataClassMessagePackMixin. You can override from_json because it’s a simple method that calls from_dict. On the other hand, from_msgpack is decorated with @final decorator for a reason — it’s a warning that you shouldn’t override it in subclasses (mypy will yell if you do that). This method looks like a stub because the specific implementation will be compiled and injected into the dataclass. So, if you break the rules and try to override it, you will be surprised. Probably, it would be nice to call warnings.warn(…) in case someone has it overridden in a subclass.

I see you want “typed” from_* methods that have a T replacement as an argument. It’s different from what mixins from mashumaro provide. I guess, if you like mixins, you can stop using ones from mashumaro and create your own that will be providing methods with different signature. Generic version with codecs under the hood can be something like this:

@classmethod
def from_msgpack(
    cls: "WrapperCls",
    data: EncodedData,
    data_type: Type[T],
    msgpack_decoder: Decoder = default_decoder,
) -> "WrapperCls[T]":
    decoded_data = msgpack_decoder(data)
    data_part = decoded_data.get("data")
    dict_decoder = cls.__dict_decoders__.get(data_type)
    if not dict_decoder:
        dict_decoder = BasicDecoder(data_type)
        cls.__dict_decoders__[data_type] = dict_decoder
    return cls(dict_decoder.decode(data_part))

If you want to be sure that T replacement is serializable through bound, you can do it:

T = TypeVar(“T”, DataClassDictMixin)

@classmethod
def from_msgpack(
    cls: "WrapperCls",
    data: EncodedData,
    data_type: Type[T],
    msgpack_decoder: Decoder = default_decoder,
) -> "WrapperCls[T]":
    decoded_data = msgpack_decoder(data)
    data_part = decoded_data.get("data")
    return cls(data_type.from_dict(data_part))
Fatal1ty commented 5 months ago

I’m going to close this issue as stale. If you still need help, feel free to answer.