Fatal1ty / mashumaro

Fast and well tested serialization library
Apache License 2.0
774 stars 45 forks source link

omit_default=True evaluates default_factory value at declaration time which can cause a circular reference #174

Closed m3talstorm closed 1 year ago

m3talstorm commented 1 year ago

Description

In the following example, we had a sequence number that was incremented every time an instance of the Thing class was created, storing it on the class so that at any point you could access the current Thing.SEQUENCE for comparisons, debug etc.

We added the omit_default = True to config to strip out fields from the serialized output.

As the default_factory is called when the dataclass is declared (https://github.com/Fatal1ty/mashumaro/pull/162/commits/79593d8690484d125857d61f20e6a05632d9c0ac#diff-06ae7d9291db25235a16beccb8abc8d7e0af7a4fb21237def9a7fa3e20fae672R1011) and the Thing class hasn't finished being declared, you get a

venv/lib/python3.12/site-packages/mashumaro/mixins/dict.py:25: in __init_subclass__
    compile_mixin_packer(cls, **builder_params["packer"])
venv/lib/python3.12/site-packages/mashumaro/core/meta/mixin.py:29: in compile_mixin_packer
    builder.add_pack_method()
venv/lib/python3.12/site-packages/mashumaro/core/meta/code/builder.py:1138: in add_pack_method
    self._add_pack_method_lines(method_name)
venv/lib/python3.12/site-packages/mashumaro/core/meta/code/builder.py:909: in _add_pack_method_lines
    default = self.get_field_default(
venv/lib/python3.12/site-packages/mashumaro/core/meta/code/builder.py:232: in get_field_default
    return field.default_factory()
tests/player/test_player_create_incr.py:26: in sequence_incr
    Thing.SEQUENCE += 1
E   NameError: name 'Thing' is not defined

Would it be better to do this evaluation at the start of the serialization process(and cache it) rather than class declaration time? I guess there would be a performance hit on the first serialize, maybe add a lazy_omit_default config value? 🤔😅 Or, is there an alternative approach?

What I Did

from __future__ import annotations
from dataclasses import dataclass, field
from typing import ClassVar

from mashumaro.mixins.dict import DataClassDictMixin
from mashumaro.mixins.json import DataClassJSONMixin
from mashumaro.config import BaseConfig

def sequence_incr() -> int:
    """
    """

    try:
        return Thing.SEQUENCE
    finally:
        Thing.SEQUENCE += 1

    return

@dataclass(slots=True, frozen=False)
class Thing(DataClassJSONMixin, DataClassDictMixin):
    """
    """

    class Config(BaseConfig):
        """
        """

        omit_none = True
        omit_default = True

    sequence: int = field(default_factory=sequence_incr)

    SEQUENCE: ClassVar[int] = 0

 thing = Thing()

 print(thing)
Fatal1ty commented 1 year ago

Hi @m3talstorm

I guess there would be a performance hit on the first serialize, maybe add a lazy_omit_default config value?

Yes, there would be a performance hit. I do think that the new config option is not worth it in terms of cost and profit. Nevertheless, I'm sure there are workarounds or alternative approaches. The question is, do you really need to be able to pass an explicit sequence number to the constructor? Depending on the answer, there are some ways to avoid non-idempotent default_factory usage: 1) change sequence type to Optional[int] with default None and set it in post-init:

@dataclass(slots=True, frozen=False)
class Thing(DataClassJSONMixin, DataClassDictMixin):
    class Config(BaseConfig):
        omit_none = True
        omit_default = True

    sequence: Optional[int] = field(default=None)

    SEQUENCE: ClassVar[int] = 0

    def __post_init__(self):
        if self.sequence is None:
            self.sequence = sequence_incr()

2) the same as above but instead of Optional[int] keep it as int with default -1 (I suppose, sequence number must be positive):

@dataclass(slots=True, frozen=False)
class Thing(DataClassJSONMixin, DataClassDictMixin):
    class Config(BaseConfig):
        omit_none = True
        omit_default = True

    sequence: int = field(default=-1)

    SEQUENCE: ClassVar[int] = 0

    def __post_init__(self):
        if self.sequence < 0:
            self.sequence = sequence_incr()

3) remove sequence from the constructor and set it in post-init instead:

@dataclass(slots=True, frozen=False)
class Thing(DataClassJSONMixin, DataClassDictMixin):
    class Config(BaseConfig):
        omit_none = True
        omit_default = True

    sequence: int = field(init=False)

    SEQUENCE: ClassVar[int] = 0

    def __post_init__(self):
        self.sequence = sequence_incr()
m3talstorm commented 1 year ago

@Fatal1ty Thank you for your response, I will try out your proposed solutions.

Fatal1ty commented 1 year ago

@m3talstorm Feel free to reopen this issue if you have further questions