SeldonIO / MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
https://mlserver.readthedocs.io/en/latest/
Apache License 2.0
685 stars 177 forks source link

Add support for nan values in PandasCodec #1747

Open Pappol opened 3 months ago

Pappol commented 3 months ago

PandasCodec does not support null values well at all, the method can_encode is completely misleading by just checking if it is a dataframe.

sp1thas commented 2 months ago

Thanks for raising this @Pappol

json serialization is wrong as well:

import pandas as pd
from mlserver.codecs.pandas import PandasCodec

df = pd.DataFrame({'foo': [None, 1.0]})
PandasCodec.encode_request(df).json()

serialized request:

{
    "parameters": {
        "content_type": "pd"
    },
    "inputs": [
        {
            "name": "foo",
            "shape": [
                2,
                1
            ],
            "datatype": "FP64",
            "data": [
                NaN,
                1.0
            ]
        }
    ]
}

btw, In case anyone else is facing the same issue, this is my quick-n-dirty way to handle it:

import pandas as pd
from mlserver.types import InferenceRequest

def replace_nan_with_none(inference_request: InferenceRequest) -> InferenceRequest:
    for i, _input in enumerate(inference_request.inputs):
        for ii, v in enumerate(_input.data.__root__):
            if pd.isna(v):
                inference_request.inputs[i].data.__root__[ii] = None
    return inference_request
Pappol commented 2 months ago

Main issue is with dates data types

ramonpzg commented 1 month ago

Hi @sp1thas -- Thanks for bringing this up and for showing your workaround. I will assign this to myself and have a look at what exactly is causing this.

@Pappol -- Do you have a reproducible example of the behaviour you are experiencing?

sp1thas commented 5 days ago

Hey @ramonpzg , I've also took a look in the meanwhile and I've opened #1893 . Could you review it? Looking forward for your input.