Fatal1ty / mashumaro

Fast and well tested serialization library
Apache License 2.0
771 stars 45 forks source link

MessagePackDecoder fails in dataclass complex case #252

Open Future-Outlier opened 2 weeks ago

Future-Outlier commented 2 weeks ago

mashumaro version: 3.13.1
Python version: 3.12.4
Operating System: macOS Sonoma 14.3

Description

I'm a maintainer of Flyte, and I encountered an issue while using mashumaro.codecs.msgpack's MessagePackDecoder and MessagePackEncoder. Specifically, the MessagePackDecoder fails to handle a certain edge case.

What I Did

from mashumaro.codecs.msgpack import MessagePackDecoder, MessagePackEncoder
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Union
from enum import Enum
import msgpack

def _default_msgpack_decoder(data: bytes) -> Any:
    return msgpack.unpackb(data, raw=False, strict_map_key=False)

@dataclass
class InnerDC:
    max_depth: int = 10
    max_features: str = "sqrt"
    n_estimators: int = 100

@dataclass
class DC:
    grid: Dict[str, List[Optional[InnerDC]]] = field(default_factory=lambda: {
        'all_types': [InnerDC()],
    })

# Encoding works
encoder = MessagePackEncoder(DC)
msgpack_bytes = encoder.encode(DC())

# Decoding works
decoder = MessagePackDecoder(DC, pre_decoder_func=_default_msgpack_decoder)
python_val = decoder.decode(msgpack_bytes)
assert python_val == DC()

# Another test with Enum
class Status(Enum):
    PENDING = "pending"
    APPROVED = "approved"
    REJECTED = "rejected"

@dataclass
class DC:
    grid: Dict[str, List[Optional[Union[int, str, float, bool, Status, InnerDC]]]] = field(default_factory=lambda: {
        'all_types': [InnerDC()],
    })

encoder = MessagePackEncoder(DC)
msgpack_bytes = encoder.encode(DC())

# Decoding fails
decoder = MessagePackDecoder(DC, pre_decoder_func=_default_msgpack_decoder)
python_val = decoder.decode(msgpack_bytes)
print(python_val)
assert python_val == DC()

In both cases, encoding works, but decoding with MessagePackDecoder fails.

Future-Outlier commented 2 weeks ago

In summary, this is a super edge case.

@dataclass
class DC:
    grid: Dict[str, List[Optional[Union[int, str, float, bool, Status, InnerDC]]]] = field(default_factory=lambda: {
        'all_types': [InnerDC()],
    })
Future-Outlier commented 2 weeks ago

You might be interested in how mashumaro is used in bytes, so let me give you a brief concept.

encode

python val -> msgpack bytes -> protobuf literal

decode

protobuf literal -> msgpack bytes -> python val

How we create codecs? (encoder and decoder)

We will provide python type to encoder or decoder to serialize/deserialize it now, thank you for looking this, and feel free to ping me to collaborate, I'll try to reply it in 1 day <3