SeldonIO / MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
https://mlserver.readthedocs.io/en/latest/
Apache License 2.0
726 stars 183 forks source link

Adaptive batching leads to parameters being cut off #1541

Open tobbber opened 10 months ago

tobbber commented 10 months ago

Hi, I observed some weird behavior when using the REST API with adaptive batching enabled. When sending a single request to the v2 REST endpoint /v2/models/<MODEL>/infer the Parameters within the responseOutput are cut off. If a parameter is not an iterable, a TypeError is raised: e.g. TypeError: 'int' object is not iterable

Note that this only happens when:

  1. Adaptive batching is enabled
  2. A single request is sent within the max_batch_time time window

How to Reproduce:

# model.py 
from mlserver import MLModel
from mlserver.types import InferenceResponse, ResponseOutput, InferenceRequest

class EchoModel(MLModel):
    async def load(self):
        return True

        async def predict(self, payload: InferenceRequest):
        request_input = payload.inputs[0]
        # return the payload input as output
        output = ResponseOutput(**request_input.dict())
        return InferenceResponse(model_name=self.name, outputs=[output])
// model-settings.json
{
    "name": "echoModel",
    "max_batch_time": 2,
    "max_batch_size": 32,
    "implementation": "model.EchoModel"
}

Request Body:

// POST to localhost:8080/v2/models/echoModel/infer
{
    "inputs": [{
        "name": "docs",
        "shape": [2],
        "datatype": "INT32",
        "parameters": {
            "id": "123"
        },
        "data": [10,11]
    }]
}

Expected behavior: EchoModel returns the RequestInput as Output.

Actual behavior: Parameter in the output are cut off or TypeError is raised

Examples:

It seems like the Parameters are unbatched even if they were never batched in the first place.

yaliqin commented 9 months ago

Hi @tobbber Can you share the Dockerfile used? I tried to wrap up my code as a similar way and set up the batch settings. Then I met the error of prometheus_client issue as below File "/opt/conda/lib/python3.8/site-packages/prometheus_client/metrics.py", line 121, in __init__ registry.register(self) File "/opt/conda/lib/python3.8/site-packages/prometheus_client/registry.py", line 29, in register raise ValueError( ValueError: Duplicated timeseries in CollectorRegistry: {'batch_request_queue_count', 'batch_request_queue_bucket', 'batch_request_queue_created', 'batch_request_queue_sum'} I used mlserver build and the generated Dockerfile use seldonio/mlserver:1.3.5-slim

tobbber commented 9 months ago

Hi @yaliqin, i used the Mlserver CLI directly with mlserver start mlserver_example/ with structure:

mlserver_example/
├── model-settings.json
└── model.py

To install mlserver i used pip install mlserver==1.3.5

yaliqin commented 9 months ago

Thank you very much! Which python version are you using?

On Tue, Feb 13, 2024 at 4:23 AM Tobi @.***> wrote:

Hi @yaliqin https://github.com/yaliqin, i used the Mlserver CLI directly with mlserver start mlserver_example/ with structure:

mlserver_example/ ├── model-settings.json └── model.py

To install mlserver i used pip install mlserver==1.3.5

— Reply to this email directly, view it on GitHub https://github.com/SeldonIO/MLServer/issues/1541#issuecomment-1941396953, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKSGHEZNB5HWGQOTQ5KVORLYTNLNBAVCNFSM6AAAAABB6TOTQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBRGM4TMOJVGM . You are receiving this because you were mentioned.Message ID: @.***>

tobbber commented 9 months ago

I am using python 3.11.6 on a arm64 machine (M1 mac)

yaliqin commented 9 months ago

Thanks @tobbber. mlserver start .worked but the docker run failed. Will check the difference.