JakobGM / patito

A data modelling layer built on top of polars and pydantic
MIT License
252 stars 23 forks source link

pt.Field does not respect alias= argument #75

Open adamsardar opened 1 month ago

adamsardar commented 1 month ago

Perhaps I misunderstand the way that Field(alias = 'name') should work in patitio, but I was surprised by these errors:

>>> from typing import Literal
>>> 
>>> import patito as pt
>>> import polars as pl
>>> 
>>> class Product(pt.Model):
...     product_id: int = pt.Field(unique=True, alias='prod')
...     name: str
...     temperature_zone: Literal["dry", "cold", "frozen"]
...     demand_percentage: float
...
>>> valid_product_df = pl.DataFrame(
...     {
...         "product_id": [64, 11],
...         "name": ["Pizza", "Cereal"],
...         "temperature_zone": ["frozen", "dry"],
...         "demand_percentage": [0.07, 0.16],
...     }
... )
>>>
>>> Product.validate(valid_product_df) # No errors
>>>
>>> also_valid_product_df = pl.DataFrame(
...     {
...         "prod": [64, 11],
...         "name": ["Pizza", "Cereal"],
...         "temperature_zone": ["frozen", "dry"],
...         "demand_percentage": [0.07, 0.16],
...     }
... )
>>>
>>> Product.validate(also_valid_product_df) #Surprise!
Traceback (most recent call last):
[...]
patito.exceptions.DataFrameValidationError: 2 validation errors for Product
product_id
  Missing column (type=type_error.missingcolumns)
prod
  Superfluous column (type=type_error.superfluouscolumns)

From the way that Field aliases work in pydantic I thought that 'prod' would be interpreted as 'product_id'. Is this expected behaviour? Because if so perhaps a clarifying line in the docs to rename columns would help?

Thanks for your work with patito - it's shaping up great! I really like it and it's massively helping my projects!

Env details

Python 3.9.19 patitio 0.6.1 polars 0.20.31 pydantic 2.7.4