autotraderuk / fastapi-mlflow

Deploy mlflow models as JSON APIs with minimal new code
Apache License 2.0
19 stars 4 forks source link

Allows string type in prediction results #6

Closed dym-ok closed 1 year ago

dym-ok commented 1 year ago

Thank you for such a great library!

I noticed that if a model returns an array of strings ("str") we'd be getting the following error.

Traceback (most recent call last):
  File "/opt/conda/envs/mlflow-env/lib/python3.9/site-packages/fastapi_mlflow/predictors.py", line 74, in predictor
    results.fillna(np.nan)
AttributeError: 'numpy.ndarray' object has no attribute 'fillna'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
...........................
  File "/opt/conda/envs/mlflow-env/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/opt/conda/envs/mlflow-env/lib/python3.9/site-packages/fastapi_mlflow/predictors.py", line 83, in predictor
    value = row if not np.isnan(row) else None
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I suggest this change to allow returning strings from a predictor, however, there might be a more clever one :)

dym-ok commented 1 year ago

Hey @bloomonkey Sorry, my PR is quite raw since I lack some understanding of core ideas behind your code. I realise that they have some context šŸ˜„

In this part you have some logic for processing predictors output

    def predictor(request: Request) -> Response:
        results = model.predict(request_to_dataframe(request))
        try:
            response_data = (
                results.fillna(np.nan)
                .replace([np.nan], [None])
                .to_dict(orient="records")
            )
        except (AttributeError, TypeError):
            # Return type is probably a simple array-like
            # Replace NaN with None
            response_data = []
            for row in results:
                value = row if not np.isnan(row) else None
                response_data.append({"prediction": value})

        return Response(data=response_data)

This code expect NumPy data types and I'm thinking how to add support for non-NumPy data types. As an example, in our case predictor returns array of strings, but there could be other things, so instead of my things it would be nice to have a more generalised handling.

You're getting rid of NaN so that they could be serialised to JSON properly or could you explain your idea here?

Thanks!

bloomonkey commented 1 year ago

Hi @1inuxoid

No need to apologise - I appreciate your engagement, and I think you have identified a bug! šŸ›

You're getting rid of NaN so that they could be serialised to JSON properly or could you explain your idea here?

Yes, that's intention of this part of the code. The Response is a subclass of pydantic.BaseModel; the instances of this class gets serialised to JSON for the response, which fails if any of the values are pandas or numpy specific types. So this part of the code converts them to native Python types - I might be able to make this intention clearer with a small refactor...

fastapi-mlflow is intended to support the MLflow python_function flavour, the docs for which say that the return value for model.predict() could be: [numpy.ndarray | pandas.(Series | DataFrame) | List]. I think that fastapi-mlflow fails to support List - it certainly doesn't have a test for it!

I'll try to craft a test for this which will hopefully confirm that patch fixes this :)

bloomonkey commented 1 year ago

Hrm, appears that MLflow's infer_schema doesn't support List...

bloomonkey commented 1 year ago

Refactoring discussed, already on the main branch: a3ad12056278b66da0346566ab21e478c5b859ee

A failing test that reproduces the identified bug in a branch: 099e5241b52a2fd2dddb0205c19b1356fc29e1e4

bloomonkey commented 1 year ago

Also found this SO post which has an alternative suggestion :) https://stackoverflow.com/questions/18689512/efficiently-checking-if-arbitrary-object-is-nan-in-python-numpy-pandas

bloomonkey commented 1 year ago

Fixed in 52e4f6c6d6a05a9546d4d714f990058ba22d4742 Released in 0.3.2