SeldonIO / seldon-core

An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models
https://www.seldon.io/tech/products/core/
Other
4.4k stars 832 forks source link

Length of custom field or response from MLServer causes missing payload with lack of helpful logs #4625

Open dtpryce opened 1 year ago

dtpryce commented 1 year ago

Describe the bug

When using a custom MLServer model we have created a JSON input (JSON str) and output (JSON dict) via curl the length of one of the custom fields would cause Seldon Core to return status 200 but no payload. Testing directly against MLServer everything appears to be fine, then once the field used is longer than 18 characters this behaviour appears only in Seldon Core.

To reproduce

  1. create a custom JSON in / out model similar to https://mlserver.readthedocs.io/en/stable/examples/custom-json/README.html ensure there is at least one string field
  2. Test curl against MLServer infer endpoint with the string field at length 18, 19, 20 and more (should all be fine)
  3. Deploy within Seldon core (V1 - not tried in V2)
  4. Test curl against Seldon core infer endpoint with the string field at length 18, 19, 20 and more (for 20 or greater should return 200 but no payload)

Expected behaviour

See above plus screenshots below

One more thing to note is that the rest of the entries in data dictionary were fixed in testing but might be contributing to overall length of payload, in screenshots you can see content-length of successful responses getting near to 500 characters before failure. If hard to reproduce try making sure the response dict is almost 500 chars long and then vary one field up to that threshold (it might be total content size?)

Environment

Model Details

The following screenshots show MLServer response all fine, then attempts of curling Seldon with various ResponseOutput / fields within lengths to get missing payload and no helpful logging.

mlserver-direct-good seldon-core-bad-debug seldon-core-length-discover

dtpryce commented 1 year ago

@cliveseldon would this behaviour be different in V2? We are about to look at upgrading

ukclivecox commented 1 year ago

It would be a bug on either version. We have not had a chance to replicate on v1 yet. Trying the same model on V2 would be useful.

dtpryce commented 1 year ago

Ok well we should be trying some V2 stuff out soon but would love to hear about a fix, we are seeing this more and more now with our JSON output models.

dtpryce commented 1 year ago

We have also discovered that gRPC endpoint do not suffer from this bug. So this is only relevant to REST and appears to be on both long JSON string input and output.

adriangonz commented 1 year ago

Hey @dtpryce ,

So far we haven't been able to replicate this one internally. Could you share more details about your environment that can help us replicate it (e.g. what ingress are you using, is there a LB on front of that, etc.)?

Also, could you try testing this with the latest SC?

dtpryce commented 1 year ago

Hey @adriangonz Sorry for slow response. I will let @Kolajik comment more on the infrastructure but I do know that we were going through a slightly different ingress.

We are also about to deploy V2 and will keep this issue in mind before rolling out.