Validate data via FlexMeasures

nhoening commented 3 years ago

We cannot validate DataFrame data with the Marshmallow code we wrote for the API parts of FlexMeasures.

I believe we should validate this data separately, as the CLI functions might often have such data. But this architecture discussion is ongoing, see here.

Ahmad-Wahid commented 1 year ago

I think we can use pydantic and pandera to validate dataframes.

Ahmad-Wahid commented 1 year ago

Check this simple example:

import pandas as pd
import pandera as pa
from pandera.typing import DataFrame, Series

class OutputSchema(pa.SchemaModel):
    """Schema for testing dataframe."""
    column1: Series[int] = pa.Field(nullable=False)
    column2: Series[str] = pa.Field(nullable=False)

    class Config:
        """Consider columns that are given in schema."""
        strict = True

@pa.check_types(lazy=True)
def validate_dataframe(data: DataFrame) -> DataFrame[OutputSchema]:
    return data

# data example
original_data = {
    "column1": [1,2,3,4,"5"],
    "column2": ['1','2','4','5',1],

}

# create dataframe
df = pd.DataFrame(original_data)

try:
    dataframe = validate_dataframe(df)
    print(dataframe)
except pa.errors.SchemaError as error:
    print(error)

@nhoening @Flix6x

SeitaBV / flexmeasures-entsoe

Validate data via FlexMeasures #3