Support 'object' schema fields

autotraderuk / fastapi-mlflow

Deploy mlflow models as JSON APIs with minimal new code

Apache License 2.0

19 stars 4 forks source link

Support 'object' schema fields #11

Closed Andrew-Crosby closed 1 year ago

Andrew-Crosby commented 1 year ago

Models returning a string field (e.g. classification models) may have the following output schema in mlflow

Tensor (dtype: object, shape: [-1])

We should map 'object' schema types to Python 'string' types: https://github.com/autotraderuk/fastapi-mlflow/blob/98701a0f5af535ceedda78c07759f5a3b6b62b8e/fastapi_mlflow/_mlflow_types.py#L10-L23

bloomonkey commented 1 year ago

Thanks for reporting @Andrew-Crosby

Do you have an example of how such an mlflow schema could be created? I would have hoped that tests against this fake model would have covered string/object types, but maybe this bug only manifests when the return type is an array/tensor?

Andrew-Crosby commented 1 year ago

Scikit-learn classifiers produce an output array of type 'object'

This example code

X = pd.DataFrame({"feature": [1,2]})
y = pd.Series(["yes", "no"])

preds = LogisticRegression().fit(X, y).predict(X)

mlflow.models.infer_signature(X, preds)

produces the following signature:

inputs: 
  ['feature': long]
outputs: 
  [Tensor('object', (-1,))]

bloomonkey commented 1 year ago

I'm having a hard time reproducing this without introducing sklearn as a dependency. i.e. adding the following model to the test suite continues to pass all tests

class StrModel(PythonModel):
    def predict(
        self, context: PythonModelContext, model_input: pd.DataFrame
    ) -> npt.ArrayLike:
        return np.array(["42"] * len(model_input))

bloomonkey commented 1 year ago

Found a way to test this; can create a schema from a dict or from JSON string :)

Schema.from_json('[{"type": "tensor", "tensor-spec": {"dtype": "object", "shape": [-1]}}]')

bloomonkey commented 1 year ago

Fixed. Will release along with a few other fixes as 0.4.1 within the next week