Fatal1ty / mashumaro

Fast and well tested serialization library
Apache License 2.0
758 stars 45 forks source link

Adds support for fields with local dataclass types #180

Closed mishamsk closed 8 months ago

mishamsk commented 9 months ago

Hey,

This PR makes the following work:

def test_local_type():
    @dataclass
    class LocalType:
        pass

    @dataclass
    class DataClassWithLocalType(DataClassDictMixin):
        x: LocalType

    obj = DataClassWithLocalType(LocalType())
    assert obj.to_dict() == {"x": {}}
    assert DataClassWithLocalType.from_dict({"x": {}}) == obj

First why: as you know, I use a custom deserialization base class that maintains a global registry of DataClassDictMixin subclasses allowing discrimination by subclass. I haven't migrated to 3.10, so it is still there.

This in turn means that at any given moment all subclass names must be unique. This is inconvenient for tests, but since so far mashumaro enforces classes in global scope, there is no easy way to create isolated tests, that define classes and then remove them.

Not sure if there are a lot of real life use cases outside of tests, but there may be...

The way I've implemented it is minimally disruptive. It should change nothing unless the type annotation is a locally defined class. Generally speaking, the current code generation can stop relying on fully qualified names of types entirely and use my change that just adds them to locals before executing. But that may have bigger consequences, so I opted no to do that.

Fixes #182

Fatal1ty commented 9 months ago

Hi @mishamsk

Thank you for bringing attention to this problem. I've been placing dataclasses globally in the tests, trying to postpone the inevitable :) In fact, I have already made preparation to solve this problem with local types. There is clean_id function that strips off angle brackets:

https://github.com/Fatal1ty/mashumaro/blob/517d8d50fcded8c980094dc4371e6ef2aa1e1bb3/mashumaro/core/meta/types/common.py#L306-L309

It would be safer to use it because someone might have more than one local type. I've made a quick proof of concept in this commit:

https://github.com/Fatal1ty/mashumaro/blob/5e0b05bb6a6392bfef31238cef085abd89a30fe7/mashumaro/core/meta/code/builder.py#L563-L565

But the most important thing is that <locals> can appear not only in _unpack_method_set_value. For example, when someone is using codecs, the generated by a local encoder code may need a reference to a local type. The safest way to find all potential places is to go from where field_type is being called. I understand this may require a deep dive into non-trivial code and I am willing to do it on my own, but if you want to challenge yourself, I don't mind :)

pep8speaks commented 9 months ago

Hello @mishamsk! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2023-12-23 01:55:42 UTC
mishamsk commented 9 months ago

@Fatal1ty well, I wasn't sure how far you'd want to go with refactoring the code.

First using the existing helpers obviously makes sense - I added a commit to PR, which uses what you've suggested except for two changes:

I get the fact that this won't solve the issue completely, but it doesn't mean it can't be merged (I guess).

I reviewed type_name usage and I'd say, if refactoring to fully support locals is considered, it may worth moving away from module imports + generating code that uses qualnames entirely. Instead, just create unique local names for each type parameter used in packing/unpacking and bind them in globals/local namespace used at compilation. This should theoretically be more resilient to whatever happens to python.

It feels a bit daunting to add a fork in every single place where type_name is used, to either use the old method (import module + refer to types/object via qualname) OR the new one (bind them to a local name).

It is a pretty big change either way. Maybe worth doing it gradually (e.g. for Mixin subclasses only first, then codecs).

What do you think?

Fatal1ty commented 9 months ago
  • I took a liberty to implement a more "robust" clean_id function
  • implemented local type check as a helper based on type directly, instead of the string representing coming from type_name

Yeah, it looks better this way!

I get the fact that this won't solve the issue completely, but it doesn't mean it can't be merged (I guess).

Absolutely. One step at a time is better than nothing.

I reviewed type_name usage and I'd say, if refactoring to fully support locals is considered, it may worth moving away from module imports + generating code that uses qualnames entirely. Instead, just create unique local names for each type parameter used in packing/unpacking and bind them in globals/local namespace used at compilation. This should theoretically be more resilient to whatever happens to python.

This would be a big change that is likely to affect performance. Some time ago I was tempted by eliminating qualnames, thinking that one name instead of "getattr chain" would speed things up, but to my surprise performance was getting the same or even worse. However, I admit that something might have changed since then and it's worth trying again.

It is a pretty big change either way. Maybe worth doing it gradually (e.g. for Mixin subclasses only first, then codecs).

I totally agree. I will review your changes soon.

mishamsk commented 9 months ago

I haven’t made changes across the codebase, wanted your thoughts on it first.

So we are narrowing the scope to mixins only.

Some time ago I was tempted by eliminating qualnames, thinking that one name instead of "getattr chain" would speed things up, but to my surprise performance was getting the same or even worse.

Interesting. I was expecting python to create a closure when it runs exec, so it should not matter how exactly particular object was bound before execution, rather closure will end up having direct pointers to the object…but this may not work this way indeed.

I guess I can start adding bindings for local objects anyways. To switch everything to locals or not may be a later stage decision…

mishamsk commented 9 months ago

so this fixes #182

what I did since the last commit:

I reverted and renamed the is_local_type to check the resolved name, since in some cases (like with Literal) the local type was wrapped in typing construct and thus to identify local type, would mean recreating the logic from type name.

Alternative would be to change type_name return to also produce a flag for local types, but that would yield so many changes in the code, I thought it is not worth it.

I'd vote to merge it this way, as it should cover 99% of real-life cases...but up to you. I am eagerly waiting for this;-)

Fatal1ty commented 8 months ago

Sorry for late answer, the New Year's Eve fuss takes up a lot of my time.

Alternative would be to change type_name return to also produce a flag for local types, but that would yield so many changes in the code, I thought it is not worth it.

You made huge work here but I have a small suggestion about local type names. It seems like it would be practical if we encapsulate checking for the localness and adding to the "globals" in a new type_name (or better get_type_name) method in CodeBuilder (with the same arguments as type_name from mashumaro.core.meta.helpers has). This method will return normal type_name(...)'s result if it's not a local type and a cleaned type name otherwise. It must eliminate all these duplicates of the code that you most likely got bored with while writing.

if it is only for an exception message - do nothing, will include "locals"

Great, we're on the same page here. I also think that there is no need to change it there.

P.S. The next time I will be available in early January. Merry Christmas and a happy New Year, take more time off from the computer! 🎅

mishamsk commented 8 months ago

Sorry for late answer, the New Year's Eve fuss takes up a lot of my time.

not surprised;-)

method in CodeBuilder

not a big fan of methods lately, but sure, that does make sense. applied the change. except that I named it get_type_name_identifier to relate the fact that the goal is to get a valid identifier, not just some name. Given the real usage I opted to not pass all of the type_name args to this method, seems unnecessary.

The next time I will be available in early January. Merry Christmas and a happy New Year

Thanks, same to you!

take more time off from the computer!

sounds like my wife told you something;-)

Fatal1ty commented 8 months ago

I added usage of get_type_name_identifier for fields with Self in their type annotations and for annotated serializable types with Self in the _deserialize / _serialize methods. I guess, the Discriminator's code may still have problems with local types but let's leave it to the first case.

Thank you for this pull request. You did it exactly the way I would have done it.

Fatal1ty commented 7 months ago

I guess, the Discriminator's code may still have problems with local types but let's leave it to the first case.

Discriminators are fixed in #189.