Fatal1ty / mashumaro

Fast and well tested serialization library
Apache License 2.0
742 stars 43 forks source link

Different results between standard library json and orjson #225

Open unratito opened 1 month ago

unratito commented 1 month ago

Description

When using mixins to serialize data classes to JSON, standard library json and orjson give different results.

What I Did

This code uses standard library json:

from dataclasses import dataclass
from mashumaro.mixins.json import DataClassJSONMixin
import json

@dataclass
class A(DataClassJSONMixin):
    x: int

@dataclass
class B(A):
    y: str

@dataclass
class W(DataClassJSONMixin):
    inner: A

b = B(5, 'hi')
w = W(b)

print(json.dumps(w.to_dict()))
print(w.to_json())

And it prints these results:

{"inner": {"x": 5, "y": "hi"}}
{"inner": {"x": 5, "y": "hi"}}

While this equivalent code uses orjson:

from dataclasses import dataclass
from mashumaro.mixins.orjson import DataClassORJSONMixin
import orjson

@dataclass
class A(DataClassORJSONMixin):
    x: int

@dataclass
class B(A):
    y: str

@dataclass
class W(DataClassORJSONMixin):
    inner: A

b = B(5, 'hi')
w = W(b)

print(orjson.dumps(w.to_dict()))
print(w.to_json())

And it prints these other results:

b'{"inner":{"x":5,"y":"hi"}}'
{"inner":{"x":5}}
Fatal1ty commented 3 weeks ago

I over-optimized the serialization of dataclasses using orjson to such an extent that it led to unpleasant consequences that I overlooked 😅. In short, when we build the serialization code for W, we build the code for turning dataclass A into a dictionary with types supported by orjson, since A is specified for inner. At runtime for B, this method will be called from the parent class A without the specific field y. I need to think more about what to do, since I don’t yet see any simple solutions other than getting rid of this optimization, which I wouldn’t want to do.

Fatal1ty commented 3 weeks ago

Here is another example of this issue:

from dataclasses import dataclass
from mashumaro import DataClassDictMixin

@dataclass
class A[T]:
    x: T

@dataclass
class B[T](A):
    y: int

@dataclass
class C(DataClassDictMixin):
    z: A[int]

print(C(B(1, 2)).to_dict())  # {'z': {'x': 1}}
Fatal1ty commented 3 weeks ago

Looks like mashumaro is not the only one library which has this behavior 🤔