Open thomasaarholt opened 4 months ago
Certainly agree that this interface could use a refactor. A few concerns about replicating pydantic args onto ColumnInfo
(or another patito-side interface):
max_items
) are scheduled for deprecation. How do we intend to handle these? Ge(0)
-> pl.col("foo") >= 0
). How should patito handle strict
?Overall, being able to take an existing pydantic model and quickly it into a patito model while retaining object-level validation from pydantic is a very nice feature. But not having our own Field
definition makes us reactive to changes on the pydantic side.
Maybe we should start by defining more concretely what constitutes a patito Field
(i.e. which elements are required to perform tabular validation and schema specification), and then we can work on the conversion/serialization of a pydantic Field
to a patito Field
.
pydantic's FieldInfo
, for reference: https://github.com/pydantic/pydantic/blob/8aeac1a4c61b084ebecf61b38bb8d3e80884dc33/pydantic/fields.py#L89
My one year old is napping and giving me a chance to catch up on all of this great work and thinking! :)
@brendancooley (and others), I want to refactor the Field function in order to:
gt
,dtype
,constraints
etc rather than use args/kwargsThe suggestion
I'm considering encoding all the relevant
Field
parameters insideColumnInfo
, and ensuring thatColumnInfo
can serialize (usingfield_serializer
) and deserialize (through validators) all parameters.This means that we can type-safely pass all these arguments to pydantic's
Field
usingpydantic.fields.Field(json_schema_extra=column_info.model_dump())
.Then, at validation time, we would reconstruct the
ColumnInfo
object for each column usingColumnInfo.model_validate(some_patito_model.model_fields["some_field"])
, and be able to relatively easily use these objects for validation.However
The only things that I'm a bit unsure about:
ColumnInfo
for fields likegt
, which already exist in pydantic'sField
? Or do as we currently do and pop them off? I'm leaning towards keeping them within theColumnInfo
just to have all the logic in one place.ColumnInfo
like I suggest in the previous point, are there cases where we shouldn't do that?Let me know if this seems unclear! Writing this while alternating entertaining a 2.5 year old and a 3 month old 😅