bentoml / BentoML

The easiest way to serve AI apps and models - Build reliable Inference APIs, LLM apps, Multi-model chains, RAG service, and much more!
https://bentoml.com
Apache License 2.0
6.98k stars 777 forks source link

feature: Support pandera.SchemaModel in bentoml.io.PandasDataFrame #3230

Open cosmicBboy opened 1 year ago

cosmicBboy commented 1 year ago

Feature request

Just like a pydantic_model can be specified in the bentoml.io.JSON descriptor, providing an option to specify a pandera.SchemaModel in the PandasDataFrame descriptor would enable users to validate incoming/outgoing dataframes in their service apis with more complex, potentially custom validation checks defined in pandera SchemaModels.

Something like:

import pandera as pa
from pandera.typing import Series

class Features(pa.SchemaModel):
    feature1: Series[int] = pa.Field(gt=0)
    feature2: Series[str] = pa.Field(isin=["A", "B", "C"])
    feature3: Series[float] = pa.Field(in_range={"min_value": -1000, "max_value": 1000})

    @pa.check("feature3")
    def custom_check(cls, series):
        return -10 <= series.mean() <= 10

    class Config:
        coerce = True # coerce dtypes automatically

@svc.api(
    input=PandasDataFrame(
        pandera_model=Features,
        enforce_dtype=True),
    output=PandasDataFrame()
)
def predict(input_df: pd.DataFrame) -> pd.DataFrame:
    ...

Motivation

The current PandasDataFrame io descriptor allows for dtype enforcement but it's up to the user to implement other statistical validation checks, like range values, allowable values, potentially more complex checks that can be expressed by pandera (see here and here.

By supporting this feature, bentoml api services would be able to automatically validate these statistical properties.

Other

I'm the author of pandera 👋 and love the bentoml project!

tkaraouzene commented 1 year ago

Hi ! Any update on this issue ? I'm using pandera to validate dataframes. I would like to deploy services with bentoml. It would be wonderful to be able to validate dataframe schema with pandera. It is a bit linked with an issue I've opened: https://github.com/bentoml/BentoML/issues/3652 Since it would at least allow to have fine validation and exceptions for inputs