SeldonIO / seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
https://www.seldon.io/tech/products/core/
Other
4.38k stars 831 forks source link

[Bug] REST request gets parsed to String if any of the element in Numpy array is String #745

Closed lennon310 closed 5 years ago

lennon310 commented 5 years ago

If I'm sending a request on /predict with payload:

'json={"data": {"names": ["sepal_length", "sepal_width", "petal_length", "petal_width"], "ndarray": [[7.233, 4.652, 7.39, 0.324]]}}'

the parsed numpy array is [[7.233, 4.652, 7.39, 0.324]].

However, if the payload is

'json={"data": {"names": ["sepal_length", "sepal_width", "petal_length", "petal_width"], "ndarray": [["str", 4.652, 7.39, 0.324]]}}'

the parsed Nunpy array is [["str", "4.652", "7.39", "0.324"]] (Note the data types are all converted to String instead of keep the last 3 as floats).

ukclivecox commented 5 years ago

I think this might be expected if we are passing the data in the python wrapper as a numpy array as numpy dtype would probably be object or string.

lennon310 commented 5 years ago

Thanks @cliveseldon. I was wondering if there should be some smarter dtype indication in Numpy. For example in Pandas, we can set dtype to object , and then use infer_object to cast the actual type. I don't know if Numpy is having something similar with that. Otherwise, is it expected the user has to handle the casting in predict function by themselves?

ukclivecox commented 5 years ago

Yes, for multitype data at present its expected you would need to cast to the correct types as numpy will assign a single type. You could pass via binData, strData or jsonData fields we also provide. Each has their own pros/cons.

lennon310 commented 5 years ago

OK I will try converting pandas to binary , hopefully the deserialization won't affect the original data types. Thanks @cliveseldon

lennon310 commented 5 years ago

Can someone show me what would the Model class look like if a binary data is passed to predict function. Looks like I need to override predict_raw instead of predict, but when I run seldon-core-microservice <MODEL> REST, I was not allowed to curl with 0.0.0.0/5000/predict_raw, which means I still have to pass a Numpy.ndarray to predict. Did I miss something? It would be great if there could be an example showing how to do that.

ukclivecox commented 5 years ago

The internal container endpoint would be the same 0.0.0.0:5000/predict if you define a predict_raw then the wrapper will chose that if it exists.

https://docs.seldon.io/projects/seldon-core/en/latest/python/python_component.html#low-level-methods

lennon310 commented 5 years ago

Yeah I noticed that reference and thought it should work that way. However, from my local test looks like the wrapper did not chose predict_raw for me --- I was getting "Empty json parameter in data" error which I assume it was trying to find that numpy array from predict.

ukclivecox commented 5 years ago

That error would imply its not finding the payload in the REST request.

Can you try with: https://docs.seldon.io/projects/seldon-core/en/latest/workflow/api-testing.html#microservice-api-tester

lennon310 commented 5 years ago

I was able to override predict_raw by sending the request from SeldonClient instead of using curl.
Unlike predict that we can pass both data and feature, predict_raw seems only accept data (binData, strData, jsonData...etc), so should I need to wrap the data and feature names list into one data before converting to bytes from client side, and at server side (in predict_raw) deserialize it and extract the data and list?

ukclivecox commented 5 years ago

Predict raw will simply pass the messgae that was received to the user_model: https://github.com/SeldonIO/seldon-core/blob/7e847349a9f15d05a5ad7756aa63a935c0e52cac/python/seldon_core/seldon_methods.py#L38

So it should be the whole SeldonMessage as a Dict or proto

It presently expects a SeldonMesage proto returned,

ukclivecox commented 5 years ago

@lennon310 Any update on whether this is still an issue?

lennon310 commented 5 years ago

Thank you @cliveseldon . I was try to serialize the numpy (or json) to a byte data and send it with Seldon Client. Looks like I should add them to Seldon Message as you suggested. I haven't worked on this part yet, but I guess this issue can be closed since it was initially asking about the data type in REST. I will re-open or recreate another issue if I'm having problems in using predict_raw in the future. Thanks