Closed Andrew-Crosby closed 1 year ago
Thanks for reporting @Andrew-Crosby
Do you have an example of how such an mlflow schema could be created? I would have hoped that tests against this fake model would have covered string/object types, but maybe this bug only manifests when the return type is an array/tensor?
Scikit-learn classifiers produce an output array of type 'object'
This example code
X = pd.DataFrame({"feature": [1,2]})
y = pd.Series(["yes", "no"])
preds = LogisticRegression().fit(X, y).predict(X)
mlflow.models.infer_signature(X, preds)
produces the following signature:
inputs:
['feature': long]
outputs:
[Tensor('object', (-1,))]
I'm having a hard time reproducing this without introducing sklearn
as a dependency. i.e. adding the following model to the test suite continues to pass all tests
class StrModel(PythonModel):
def predict(
self, context: PythonModelContext, model_input: pd.DataFrame
) -> npt.ArrayLike:
return np.array(["42"] * len(model_input))
Found a way to test this; can create a schema from a dict or from JSON string :)
Schema.from_json('[{"type": "tensor", "tensor-spec": {"dtype": "object", "shape": [-1]}}]')
Fixed. Will release along with a few other fixes as 0.4.1 within the next week
Models returning a string field (e.g. classification models) may have the following output schema in mlflow
We should map 'object' schema types to Python 'string' types: https://github.com/autotraderuk/fastapi-mlflow/blob/98701a0f5af535ceedda78c07759f5a3b6b62b8e/fastapi_mlflow/_mlflow_types.py#L10-L23