ProtixIT / dataclass-binder

Python library to bind TOML data to dataclasses in a type-safe way
MIT License
13 stars 2 forks source link

Add function to construct TOML dictionary for dataclass instance #12

Open mthuurne opened 1 year ago

mthuurne commented 1 year ago

The Binder.bind() method creates a dataclass instance from the data in TOML dictionary. It would be useful to support the opposite conversion as well, where we accept a dataclass instance and return the corresponding TOML dictionary.

We do support conversion from dataclass instance to textual TOML representation with format_template(), but in the case of parsing it turned out to be useful to support dictionary input in addition to textual input, so when generating TOML, it's probably also useful to support dictionary output in addition to textual output.

This could either be a standalone function or a method on an instanced Binder. I think it would be more efficient to use an instanced Binder in the implementation, both to avoid code duplication and to not do redundant checks on the data class definition. However, as we have a binder cache already, we could have a standalone function forward the request to an instanced binder, if that simplifies the interface.

Note that dataclasses.asdict() offers similar functionality, but it does not handle some conversions like timedelta, modules and dashes in key names. Perhaps we can use asdict() with a custom dictionary factory, but probably not, as there is no accompanying list factory.

mthuurne commented 1 year ago

Now that Binder can be constructed from instances as well, maybe a method makes more sense than a function.

mthuurne commented 12 months ago

Now that Binder can be constructed from instances as well, maybe a method makes more sense than a function.

Although, if it is a method, calling that method when Binder was constructed from a class wouldn't work.

Maybe the original specialization syntax wasn't so bad after all: if Binder[DC] returns a specialized Binder class and Binder(data) returns an instanced Binder, the asdict()/to_dict() could be an instance method, such that type checkers know it can only be called with an instance.

You could also do things like type(data).parse_toml("other.toml") to parse a TOML file in the same format but without using the existing data as a default.

We'd have to check whether overloading a method (like parse_toml()) with a class and instance variant actually works both in Python itself and in mypy. I'm pretty confident that it can work in Python itself: even if it doesn't work directly, we could use descriptors instead. But I'm less confident about mypy.

Perhaps having two differently named methods is better than overloading. One would be a class method that parses from scratch and the other an instance method that parses with existing data as defaults. That would fit better with the convention in Python that you can call class methods on instances as well.

mthuurne commented 12 months ago

Note that dataclasses.asdict() offers similar functionality, but it does not handle some conversions like timedelta, modules and dashes in key names. Perhaps we can use asdict() with a custom dictionary factory, but probably not, as there is no accompanying list factory.

In theory we could post-process the dictionary returned by asdict() recursively and replace any custom types by native TOML types. However, if we're going to recursively process the TOML data, does using asdict() provide any benefits over recursively generating the TOML data ourselves?

When using a custom dictionary factory, the post-processing would see nested dataclasses multiple times: when the nested dataclass is processed itself and once for every parent level. We can't just skip recursion into nested dictionaries, as unlike dictionaries were created from dataclasses, dictionaries created from mapping types do need recursive post-processing. Therefore, if we'd use asdict() at all, it would be more efficient to post-process the top-level asdict() output once.