Closed dym-ok closed 1 year ago
Hey @bloomonkey Sorry, my PR is quite raw since I lack some understanding of core ideas behind your code. I realise that they have some context š
In this part you have some logic for processing predictors output
def predictor(request: Request) -> Response:
results = model.predict(request_to_dataframe(request))
try:
response_data = (
results.fillna(np.nan)
.replace([np.nan], [None])
.to_dict(orient="records")
)
except (AttributeError, TypeError):
# Return type is probably a simple array-like
# Replace NaN with None
response_data = []
for row in results:
value = row if not np.isnan(row) else None
response_data.append({"prediction": value})
return Response(data=response_data)
This code expect NumPy data types and I'm thinking how to add support for non-NumPy data types. As an example, in our case predictor returns array of strings, but there could be other things, so instead of my things it would be nice to have a more generalised handling.
You're getting rid of NaN so that they could be serialised to JSON properly or could you explain your idea here?
Thanks!
Hi @1inuxoid
No need to apologise - I appreciate your engagement, and I think you have identified a bug! š
You're getting rid of NaN so that they could be serialised to JSON properly or could you explain your idea here?
Yes, that's intention of this part of the code. The Response
is a subclass of pydantic.BaseModel
; the instances of this class gets serialised to JSON for the response, which fails if any of the values are pandas or numpy specific types. So this part of the code converts them to native Python types - I might be able to make this intention clearer with a small refactor...
fastapi-mlflow is intended to support the MLflow python_function
flavour, the docs for which say that the return value for model.predict()
could be: [numpy.ndarray | pandas.(Series | DataFrame) | List]
. I think that fastapi-mlflow fails to support List
- it certainly doesn't have a test for it!
I'll try to craft a test for this which will hopefully confirm that patch fixes this :)
Hrm, appears that MLflow's infer_schema
doesn't support List
...
Refactoring discussed, already on the main
branch:
a3ad12056278b66da0346566ab21e478c5b859ee
A failing test that reproduces the identified bug in a branch: 099e5241b52a2fd2dddb0205c19b1356fc29e1e4
Also found this SO post which has an alternative suggestion :) https://stackoverflow.com/questions/18689512/efficiently-checking-if-arbitrary-object-is-nan-in-python-numpy-pandas
Fixed in 52e4f6c6d6a05a9546d4d714f990058ba22d4742 Released in 0.3.2
Thank you for such a great library!
I noticed that if a model returns an array of strings ("str") we'd be getting the following error.
I suggest this change to allow returning strings from a predictor, however, there might be a more clever one :)