SeldonIO / MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
https://mlserver.readthedocs.io/en/latest/
Apache License 2.0
685 stars 177 forks source link

Shape in custom json example incorrect #1718

Open patrickhuy opened 4 months ago

patrickhuy commented 4 months ago

I played around with the custom json example and adaptive batching and found that adaptive batching only behaves correctly if we specify the shape of the input data (one byte array/string) instead of len(bytes) which the example does (here: https://github.com/SeldonIO/MLServer/blob/ce6c0de2fde54026bb9d5fe65ab1291927512266/docs/examples/custom-json/README.ipynb?short_path=475bbb5#L203C16-L203C17 ) the adaptive batching functionality will fail and incorrectly unbatch the request.

I extended the example custom-json with adaptive batching

My modified code can be found here: https://gist.github.com/patrickhuy/3fbdf9c4f4d483826f838aac859ebbbb

My client sends 2 requests:

// Request 1
{"inputs": [{"name": "echo_request", "shape": [59], "datatype": "BYTES", "data": ["{\"name\": \"Foo Bar\", \"message\": \"Hello from Client (REST)!\"}"]}]}
// Request 2
{"inputs": [{"name": "echo_request", "shape": [63], "datatype": "BYTES", "data": ["{\"name\": \"Foo Bar 2\", \"message\": \"Hello from Client (REST) 2!\"}"]}]}

These are batched and my custom model component receives a request in the form of

{
  "inputs": [
    {
      "name": "echo_request",
      "shape": [
        122
      ],
      "datatype": "BYTES",
      "data": [
        "{\"name\": \"Foo Bar\", \"message\": \"Hello from Client (REST)!\"}",
        "{\"name\": \"Foo Bar 2\", \"message\": \"Hello from Client (REST) 2!\"}"
      ]
    }
  ]
}

and replies with

{
  "model_name": "json-hello-world",
  "outputs": [
    {
      "name": "echo_response",
      "shape": [
        274
      ],
      "datatype": "BYTES",
      "parameters": {
        "content_type": "str"
      },
      "data": [
        "{\"request\": {\"name\": \"Foo Bar\", \"message\": \"Hello from Client (REST)!\"}, \"server_response\": \"Got your request. Hello from the server.\"}",
        "{\"request\": {\"name\": \"Foo Bar 2\", \"message\": \"Hello from Client (REST) 2!\"}, \"server_response\": \"Got your request. Hello from the server.\"}"
      ]
    }
  ]
}

The client then receives 2 responses like this:

{
  "model_name": "json-hello-world",
  "id": "495bdd96-6345-4a43-9192-45eb7fab7809",
  "parameters": {},
  "outputs":
    [
      {
        "name": "echo_response",
        "shape": [59],
        "datatype": "BYTES",
        "parameters": { "content_type": "str" },
        "data":
          [
            '{"request": {"name": "Foo Bar", "message": "Hello from Client (REST)!"}, "server_response": "Got your request. Hello from the server."}',
            '{"request": {"name": "Foo Bar 2", "message": "Hello from Client (REST) 2!"}, "server_response": "Got your request. Hello from the server."}',
          ],
      },
    ],
}
{
  "model_name": "json-hello-world",
  "id": "c21a20ee-1ef5-4e90-bec1-b372199d118c",
  "parameters": {},
  "outputs": [
    {
      "name": "echo_response",
      "shape": [
        63
      ],
      "datatype": "BYTES",
      "parameters": {
        "content_type": "str"
      },
      "data": []
    }
  ]
}

The second response has a shape which was unbatched but is missing the data, the first response has data for both responses by only the shape for the first response.

This can be resolved by setting the shape of requests to [1] instead of [len(bytes)]. Could you check whether the number of string elements is the correct shape and if adapt the example accordingly?