SeldonIO / MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
https://mlserver.readthedocs.io/en/latest/
Apache License 2.0
695 stars 179 forks source link

PandasCodec.encode_request can not handle missing values #1804

Closed sp1thas closed 3 months ago

sp1thas commented 3 months ago

While trying to handle missing values, I've noticed that json serialization is not correct:

import pandas as pd
from mlserver.codecs.pandas import PandasCodec

df = pd.DataFrame({'foo': [None, 1.0]})
PandasCodec.encode_request(df).json()

serialized request:

{
    "parameters": {
        "content_type": "pd"
    },
    "inputs": [
        {
            "name": "foo",
            "shape": [
                2,
                1
            ],
            "datatype": "FP64",
            "data": [
                NaN,
                1.0
            ]
        }
    ]
}

In case anyone else is facing the same issue, this is my quick-n-dirty way to handle it:

import pandas as pd
from mlserver.types import InferenceRequest

def replace_nan_with_none(inference_request: InferenceRequest) -> InferenceRequest:
    for i, _input in enumerate(inference_request.inputs):
        for ii, v in enumerate(_input.data.__root__):
            if pd.isna(v):
                inference_request.inputs[i].data.__root__[ii] = None
    return inference_request

In case this is a simple fix that could be handled by a newcomer like me, I would be interested to work on the bug fix.

sp1thas commented 3 months ago

Duplicate of #1747