JakobGM / patito

A data modelling layer built on top of polars and pydantic
MIT License
270 stars 23 forks source link

bug: pt.Field applying validation checks also on None values #11

Closed ion-elgreco closed 6 months ago

ion-elgreco commented 1 year ago

edit: I see this inherited from PyDantic, so probably root issue is there.

I have a df with the following field

class Table(pt.Model):
    score: float | None = pt.Field(ge=0, le=1)

The score can be None, but if it's a float it should be between 0 and 1. However, patito is applying this validation checks on all rows, essentially completely ignoring the fact that I allow it too also be a None. Which results in this error:

ValidationError: 1 validation errors for Table
score
  13183196 rows with out of bound values. (type=value_error.rowvalue)
ion-elgreco commented 1 year ago

Solved it by just using polars constraints:

constrainted_field= pt.Field(
    constraints=pl.when(pt.field.is_not_null()).then((pt.field >= 0) & (pt.field <= 1)).otherwise(True)
    )