Closed timoguin closed 1 month ago
Hi @timoguin! Great catch on this! Thank you for the comprehensive issue report! Can you try this branch/PR and see if it resolves your issues? https://github.com/JakobGM/patito/pull/87
I still need to add a few tests, preferably covering both the validation_alias
and serialization_alias
behaviors. If you'd like to help with that, I'd really appreciate it!
Hi @timoguin! Great catch on this! Thank you for the comprehensive issue report! Can you try this branch/PR and see if it resolves your issues? #87
I still need to add a few tests, preferably covering both the
validation_alias
andserialization_alias
behaviors. If you'd like to help with that, I'd really appreciate it!
I'd love to if I can find the time, but I doubt I'll be able to any time soon.
But I can confirm that this does indeed fix the issue I was having. It's much appreciated! Thanks for providing the library. It's been nice to work with. 😄
To supply some context around why I ran into this, I had a nested field that was ending up in Polars with one of the struct fields specified as a null
type. This is fine in Polars apparently but was causing an exception when attempting to write the dataframe to a Delta Lake table. I said "no prob, I got this Patito..." and tried to use it to pass the dtypes explicitly. But the dtypes were a lie!
Problem
When using Pydantic alias generators, nested types are not serializing properly for dtypes.
My use case is that I have data coming from an API that is in camelCase. I validate against that format using the
to_camel()
alias generator. Serialized data should always be in snake case.The bug can be reproduced with the following code:
When calling dtypes on
NestedModel
, things are serialized properly:However, when calling dtypes on
ParentModel
, the columns for NestedModel are back to camelCase:Serialization works as expected (can be initialized with camelCase):
Solution
I've recently updated all my dependencies and am not sure if this is a new issue or one that already existed. I have a branch where I've added the above code as an initial test and have played with the
mode="serialization"
flag formodel_dump_json()
, but so far I haven't figured out the issue.That branch is linked below.
It's worth noting that, without
populate_by_name=True
set on the model config, camelCase fields will fail validation. I think this is a newer flag, as well as themode
option for model dumping.References
mode="serialization"
: https://github.com/timoguin/patito/blob/fix/dtype-casing-bug-when-using-aliases/src/patito/_pydantic/schema.py#L30