bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
https://bentoml.com
Apache License 2.0
7.17k stars 792 forks source link

Json not parsed when passed to server #2003

Closed yuhongsun96 closed 2 years ago

yuhongsun96 commented 3 years ago

Describe the bug When passing a JsonInput as service.api(input), input is passed to the api function as correct type. When passing input through requests library or GUI, it's left as the string of the json and not parsed before passing into the function (called api in this case).

I'd be happy to fix this for you guys if confirmed that this is a bug that you'd like fixed!

To Reproduce Steps to reproduce the issue:

  1. Create BentoService object with a function wrapped by @api which takes a JsonInput
  2. Call the object.api through python with a json that's a list
  3. Start the server (pr as containerized server), call the same api through requests or GUI.
  4. Note that the input to the wrapped api function is not the same in these 2 cases.

Expected behavior Calling the same API through function directly or through server should have same result.

Screenshots/Logs

image

Environment:

parano commented 3 years ago

Hi @yuhongsun96 - this is because you used the .text method in the requests response object. Simply change your HTTP client code to requests.post(..).json() will give you the result expected. BentoML API server itself is only responsible for returning a response with the right content-type HTTP header, however, the HTTP client needs to decide how to parse the response body

yuhongsun96 commented 3 years ago

@parano Sorry, I guess it was unclear in the way that I showed. I'm trying to say that in the method encode, it's receiving the whole json as a single input string which is not parsed. I can update the example to show more clearly.

Basically the result of this is, if you encode ["test", "string"] for example through the GUI, it will only give back 1 encoding (which is for the whole thing).

yuhongsun96 commented 3 years ago

image

yuhongsun96 commented 3 years ago

@parano I've included a clearer demonstration of the issue, please see attached screenshot. If this is verified to be an issue, I can just fix it for you guys btw. I'd like to contribute

parano commented 3 years ago

@yuhongsun96 I see, I think you misunderstood how batch=True works here, if you just change it to batch=False, it probably will work as expected. BentoML does adaptive server-side batching when an API has batch=True, client requests will be grouped into batches dynamically on the server-side in real-time, in order to get higher throughput and better hardware utilization when running ML model serving workloads.

In this example, this client called request.post(..., json=['test', 'string']), which is considered just one prediction request, thus in the service API function, the input received will be raw=[ ['test', 'string'] ]. If you were to run this client concurrently and send lots of requests like this at the same time, the API function on the server side will receive inputs that look like raw=[ ['test', 'string'], ['test', 'string'], ['test', 'string'] ... ]. Note that when an API function is set to batch=True, it is supposed to take in a list of prediction requests and produce a list of outputs. BentoML will group requests on the server-side for the API function, and split its output into multiple HTTP responses for each client.

yuhongsun96 commented 3 years ago

@parano I see, I thought batching would work something like this: If you have 2 requests with [a, b] and [c], batching would pass [a, b, c] to the service and then return the results to the two requests as [processed_a, processed_b], [processed_c].

This would make it easier to use no? Just a simple change of processing inputs and splitting outputs

It took me a while to understand why my code wasn't working since I tried calling the api from python (instead of running server) as service.api([input_list]) and this does work with the batching set to true.

larme commented 3 years ago

@yuhongsun96 Thanks for your detailed description. I think document of this behavior should be more clear. I will track this issue and improve the situation in the new release.

yuhongsun96 commented 3 years ago

If my described functionality is good by you, I'm down to implement it.

As I see it, making this change just makes the usage more intuitive with no real downside right? Please let me know your thoughts!

parano commented 3 years ago

@yuhongsun96 no need to work on the implementation yet, the challenge is more on the design of the API. We thought about this proposed approach, the challenge with it is the users' API client will need to specify if a request is a "single item" or a "list of items", and may requires setting an additional flag in the HTTP header, which makes it even hard to use for new comers. We actually have addressed this issue in the upcoming new version with a very different approach and simplified API IO adapter's design quite a lot. I will share with you the new beta as discussed and love to hear your feedback!

parano commented 2 years ago

Hi @yuhongsun96, thank you again for the feedback, here's an update for this issue in BentoML version 1.0:

In 1.0, we separated the API endpoint definition and the dynamic batching behavior. Service API is only responsible for describing the mapping between incoming HTTP request and out coming HTTP response type and converting them to/from a python object. Batching logic is limited to the Runner level, where we recommend users to put IO or Compute intense code. More details can be found in the 1.0 guide: http://docs.bentoml.org