JakobGM / patito

A data modelling layer built on top of polars and pydantic
MIT License
252 stars 23 forks source link

Bug: Multiple constraints are incorrectly evaluated with OR, not AND #45

Closed dsgibbons closed 4 months ago

dsgibbons commented 5 months ago

When reading the pt.Field documentation for constraints, I assumed that the constraints would AND with each other. However, the behavior actually seems to be OR. This should be made clearer in the documentation. I think AND is more intuitive, but if OR was the intention, then it should be made clearer for users.

Here is an example:

import patito as pt
import polars as pl

class Line(pt.Model):
    """ A point 'x' between 0 and 1, with some 'width'. Applying 'width' at the point 'x' should not 
    extend beyond the [0, 1] interval.
    """
    x: float = pt.Field(ge=0, le=1)
    width: float = pt.Field(
        constraints=[
            (pt.col("x") - 0.5 * pt.col("width")) >= 0,
            (pt.col("x") + 0.5 * pt.col("width")) <= 1,
        ]
    )

Line.validate(pl.DataFrame({"x": [0.5], "width": [1.0]}))  # passes as expected, since 0.5 - 0.5 >= 0 and 0.5 + 0.5 <= 1
Line.validate(pl.DataFrame({"x": [0.4], "width": [1.0]}))  # passes, even though 0.4 - 0.5 < 0
Line.validate(pl.DataFrame({"x": [0.5], "width": [1.1]}))  # fails as expected (since 0.5 - 0.55 < 0 **and** 0.5 + 0.55 > 1)
thomasaarholt commented 5 months ago

Thanks, this is a bug!