JakobGM / patito

A data modelling layer built on top of polars and pydantic
MIT License
266 stars 23 forks source link

__annotations__ not inherited when inheriting a model #23

Open ion-elgreco opened 11 months ago

ion-elgreco commented 11 months ago

This is going to cause an issue when you want to validate the list column:

Reproducible example:

class Test(pt.Model):
    col: list[str]

class InhTest(Test):
    pass

df = InhTest.examples({
    "col":[['Hello']]
})
InhTest.validate(df)
print(Test.__annotations__)
print(InhTest.__annotations__)
{'col': list[str]}
{}
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[21], line 4
      1 df = InhTest.examples({
      2     "col":[['Hello']]
      3 })
----> 4 InhTest.validate(df)

File [~/<redacted>/.venv/lib/python3.10/site-packages/patito/pydantic.py:707](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/ion/<redacted>/~/<redacted>/.venv/lib/python3.10/site-packages/patito/pydantic.py:707), in Model.validate(cls, dataframe)
    662 @classmethod
    663 def validate(
    664     cls,
    665     dataframe: Union["pd.DataFrame", pl.DataFrame],
    666 ) -> None:
    667     """
    668     Validate the schema and content of the given dataframe.
    669 
   (...)
    705           Rows with invalid values: {'oven'}. (type=value_error.rowvalue)
    706     """
--> 707     validate(dataframe=dataframe, schema=cls)

File [~/<redacted>/.venv/lib/python3.10/site-packages/patito/validators.py:316](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/ion/<redacted>/~/<redacted>/.venv/lib/python3.10/site-packages/patito/validators.py:316), in validate(dataframe, schema)
    313 else:
    314     polars_dataframe = cast(pl.DataFrame, dataframe)
--> 316 errors = _find_errors(dataframe=polars_dataframe, schema=schema)
    317 if errors:
    318     raise ValidationError(errors=errors, model=schema)

File [~/<redacted>/.venv/lib/python3.10/site-packages/patito/validators.py:153](https://vscode-remote+wsl-002bubuntu-002d20-002e04.vscode-resource.vscode-cdn.net/home/ion/<redacted>/~/<redacted>/.venv/lib/python3.10/site-packages/patito/validators.py:153), in _find_errors(dataframe, schema)
    150 if not isinstance(dtype, pl.List):
    151     continue
--> 153 annotation = schema.__annotations__[column]  # type: ignore[unreachable]
    155 # Retrieve the annotation of the list itself,
    156 # dewrapping any potential Optional[...]
    157 list_type = _dewrap_optional(annotation)

KeyError: 'col'
nameloCmaS commented 3 months ago

I have just tried this on Patito 0.6.1 and it works fine:

>>> import patito as pt
>>> class Test(pt.Model):
...     col: list[str]
... 
>>> class InhTest(Test):
...     pass
... 
>>> df = InhTest.examples({"col": [["Hello"]]})
>>> df
shape: (1, 1)
┌───────────┐
│ col       │
│ ---       │
│ list[str] │
╞═══════════╡
│ ["Hello"] │
└───────────┘
>>> print(Test.__annotations__)
{'col': list[str]}
>>> print(InhTest.__annotations__)
{'col': list[str]}
>>> InhTest.validate(df)
>>> df2 = InhTest.examples({"col": [["Hello", "Bye", 1]]})
>>> df2
shape: (1, 1)
┌────────────────────────┐
│ col                    │
│ ---                    │
│ list[str]              │
╞════════════════════════╡
│ ["Hello", "Bye", null] │
└────────────────────────┘
>>> InhTest.validate(df2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sam/Documents/GitHub/filebasedbim/venv/lib/python3.12/site-packages/patito/pydantic.py", line 498, in validate
    validate(dataframe=dataframe, columns=columns, schema=cls, **kwargs)
  File "/Users/sam/Documents/GitHub/filebasedbim/venv/lib/python3.12/site-packages/patito/validators.py", line 342, in validate
    raise DataFrameValidationError(errors=errors, model=schema)
patito.exceptions.DataFrameValidationError: 1 validation error for InhTest
col
  1 missing value in lists (type=value_error.missingvalues)
>>> Test.validate(df2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sam/Documents/GitHub/filebasedbim/venv/lib/python3.12/site-packages/patito/pydantic.py", line 498, in validate
    validate(dataframe=dataframe, columns=columns, schema=cls, **kwargs)
  File "/Users/sam/Documents/GitHub/filebasedbim/venv/lib/python3.12/site-packages/patito/validators.py", line 342, in validate
    raise DataFrameValidationError(errors=errors, model=schema)
patito.exceptions.DataFrameValidationError: 1 validation error for Test
col
  1 missing value in lists (type=value_error.missingvalues)
>>> 

Assume that this can be closed?