Open thomasaarholt opened 1 month ago
If Foo.DataFrame
and set_model
are supposed to be analogous to creating a Pydantic model, shouldn't we be running validate
before returning the DataFrame
? This would make patito function more like Pydantic, and, as a bonus, we then know that the resulting table is actually of type Foo
. I guess the main issue is that validate
may be quite expensive, but wouldn't everyone using this library want to run validate
anyway?
The main goal of this was to enable the following pyright type annotations:
Before
df1
anddf2
report:# Type of "df" is "DataFrame[Unknown]"
After
df1
anddf2
report:# Type of "df" is "DataFrame[Foo]"
Discussion
This allows us to robustly pass around dataframes that have the patito Model embedded in the type annotation. I am still a bit unsure on how I feel about a
df
being aDataFrame[Foo]
before.validate()
but after.set_model()
. @dsgibbons and @JakobGM, any thoughts on this?