JakobGM / patito

A data modelling layer built on top of polars and pydantic
MIT License
272 stars 23 forks source link

Modify metaclass to allow DataFrame[Foo] type propagation #99

Open thomasaarholt opened 1 month ago

thomasaarholt commented 1 month ago

The main goal of this was to enable the following pyright type annotations:

import patito as pt
import polars as pl

class Foo(pt.Model):
    name: str

df1 = Foo.DataFrame({"name":["Bean"]})
df2 = pt.DataFrame({"name":["Bean"]}).set_model(Foo) # Type of "df" is "DataFrame[Unknown]"

Before

df1 and df2 report: # Type of "df" is "DataFrame[Unknown]"

After

df1 and df2 report: # Type of "df" is "DataFrame[Foo]"

Discussion

This allows us to robustly pass around dataframes that have the patito Model embedded in the type annotation. I am still a bit unsure on how I feel about a df being a DataFrame[Foo] before .validate() but after .set_model(). @dsgibbons and @JakobGM, any thoughts on this?

dsgibbons commented 4 weeks ago

If Foo.DataFrame and set_model are supposed to be analogous to creating a Pydantic model, shouldn't we be running validate before returning the DataFrame? This would make patito function more like Pydantic, and, as a bonus, we then know that the resulting table is actually of type Foo. I guess the main issue is that validate may be quite expensive, but wouldn't everyone using this library want to run validate anyway?