SeldonIO / seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
https://www.seldon.io/tech/products/core/
Other
4.36k stars 831 forks source link

Java wrapper return different formats with python wrapper #3551

Closed yaliqin closed 2 years ago

yaliqin commented 3 years ago

Describe the bug

We have a model trained with H2O and save the model in MOJO file. When we use python wrapper to load this model with H2O python libary, we the model inferencing result will include meta data field. In the return output, the names fields in "data" part are marked as [t:0,t:1]. When we use Java wrapper following the step in https://docs.seldon.io/projects/seldon-core/en/stable/java-jni/README.html, we got the return without meta data and the names filed are marked as ['label.0', 'label.1']. We use seldon batch_processor.py to make the batch inferencing requests. We have different use cases, other than this H2O trained model, most of the models are trained with python. We want to use Java wrapper for H2O model to increase the inferencing speed(it is much faster than the python wrapper version) and need to use Python wrapper for other cases. But we need a unified output format because our pipeline is not easy to adapt to different use cases for the post-processing steps.

To reproduce

1, train a model with H2O DAI which predicts the probability of two classes. 2, follow the instruction in https://docs.seldon.io/projects/seldon-core/en/stable/java-jni/README.html to wrapper up the model and deploy the model with seldon_core. The command to build the image is: s2i build . seldonio/s2i-java-jni-build:0.5.1 --copy --volume "$HOME/.m2":/root/.m2 --runtime-image seldonio/s2i-java-jni-runtime:0.5.1 seldon-h2o-jni:0.42 3, make inferencing with seldon_client.py or batch_processor.py 4, wrapper up the model using python wrapper by using H2O python library and deploy the model with seldon_core. 5, make inferenceing with seldon_client.py or batch_processor.py 6, compare the output of step 3 and step 5

Expected behaviour

1, the output of Java wrapped model is in below format: {"data": {"names": ["label.0", "label.1"], "ndarray": [[0.31483936309814453, 0.6851606369018555]]}} the output of python wrapped model is in below format: {"data": {"names": ["t:0", "t:1"], "ndarray": [[0.31483936309814453, 0.6851606369018555]]}, "meta": {"requestPath": {"classifier": "h2o-seldon-python-serving:0.2"}, "tags": {"tags": {"batch_id": "test_batch"}, "batch_index": 27, "batch_instance_id": "cd4eb292-05e2-11ec-9c5e-acde48001122"}

Environment

Openshift cluster

ukclivecox commented 3 years ago

The Java wrapper is in incubating and may indeed be lagging behind in functional parity with the python wrapper. Our long term vision is to just use 1 server based on the MLServer project and have runtimes for Java and R.

So medium term would appreciate external contributions if urgent to update the Java wrapper.

indranilr commented 2 years ago

@cliveseldon had similar issues with meta.requestPath in Java wrapper and have put a temporary patch in application logic to populate the same. Are you open for a PR ? could you please point to any guidelines (other than what is started in contribution guideline) in this regard ?

ukclivecox commented 2 years ago

yes a PR would be great @indranilr

ukclivecox commented 2 years ago

Closing. We will investigate Java via MLServer roadmap. Please reopen if still an issue or PR available.