This topic probably belongs in a discussion forum but I couldn't find one for patito. Please let me know if there is a better place to ask this.
I would like to use patito to validate a dataframe with a categorical column with known categories where the order of the categories is important. What I have done so far is as follows:
from typing import Literal, get_args
import patito as pt
import polars as pl
class MyModel(pt.Model):
my_col: Literal["a", "b"]
my_dtype = pl.Enum([*get_args(MyModel.model_fields["my_col"].annotation)])
good_df = pl.DataFrame({"my_col": pl.Series(["b", "a"], dtype=my_dtype)})
bad_df = pl.DataFrame(
{"my_col": pl.Series(["b", "a"], dtype=pl.Enum(["b", "a"]))}
)
MyModel.validate(good_df)
MyModel.validate(bad_df)
This passes for good_df and fails for bad_df as expected. However I'm not 100% sure that this is the intended use of Literal in a patito model, and it was a little awkward to get the correctly ordered categories to put in my custom dtype so I thought I'd ask to see if there's a better (or just different) way to do this.
This topic probably belongs in a discussion forum but I couldn't find one for patito. Please let me know if there is a better place to ask this.
I would like to use patito to validate a dataframe with a categorical column with known categories where the order of the categories is important. What I have done so far is as follows:
This passes for
good_df
and fails forbad_df
as expected. However I'm not 100% sure that this is the intended use ofLiteral
in a patito model, and it was a little awkward to get the correctly ordered categories to put in my custom dtype so I thought I'd ask to see if there's a better (or just different) way to do this.