JakobGM / patito

A data modelling layer built on top of polars and pydantic
MIT License
252 stars 23 forks source link

Pydantic V2 Support #33

Closed thomasaarholt closed 5 months ago

thomasaarholt commented 9 months ago

Hello!

First off, apologies that this has taken so long. Put briefly, Jakob wrote patito, but has since moved on from our company. We have quite a bit of data science code that relies on it, so I am motivated to help maintain the project. Unfortunately, @JakobGM and I both have a lot on our plates, and patito has so far fallen lower on our priority list.

That's hopefully changing now. @JakobGM still has too much to do, but I've got some time.

This PR is still a draft, but I decided it was more than high time to actually publish our changes, compare with the absolutely awesome work @brendancooley has been doing, and get a new version out.

The status on this at time of publishing this draft is that the core pydantic 2 support is implemented for polars, but not for duckdb. If there are users of patito who are dependent on the duckdb functionality, I encourage you to speak up, since I'm considering removing support for it if it turns out to be hard to support (we've just disabled those tests locally so far)

@brendancooley, I'm particularly eyeing up your DataFrameValidationError - that was one of the pain points we had upgarding this.

Tests aren't fully passing yet, nor is all typing complete.

brendancooley commented 9 months ago

Happy to help however I can!

brendancooley commented 9 months ago

Some progress on nullable columns, typing, and numeric bounds checking on #32 thanks to the helpful feedback from @ion-elgreco.

ion-elgreco commented 9 months ago

Does it make sense to perhaps port things from this PR to @brendancooley PR since it's more complete

thomasaarholt commented 9 months ago

Yep! I actually started doing this yesterday! It wasn’t fast going though, but I started by going through the tests and making sure they were mostly equivalent. I’ll push the branch and make a pr tomorrow.