Closed yuhongsun96 closed 2 years ago
Hi @yuhongsun96 - this is because you used the .text
method in the requests response object. Simply change your HTTP client code to requests.post(..).json()
will give you the result expected. BentoML API server itself is only responsible for returning a response with the right content-type HTTP header, however, the HTTP client needs to decide how to parse the response body
@parano Sorry, I guess it was unclear in the way that I showed. I'm trying to say that in the method encode, it's receiving the whole json as a single input string which is not parsed. I can update the example to show more clearly.
Basically the result of this is, if you encode ["test", "string"] for example through the GUI, it will only give back 1 encoding (which is for the whole thing).
@parano I've included a clearer demonstration of the issue, please see attached screenshot. If this is verified to be an issue, I can just fix it for you guys btw. I'd like to contribute
@yuhongsun96 I see, I think you misunderstood how batch=True
works here, if you just change it to batch=False
, it probably will work as expected. BentoML does adaptive server-side batching when an API has batch=True
, client requests will be grouped into batches dynamically on the server-side in real-time, in order to get higher throughput and better hardware utilization when running ML model serving workloads.
In this example, this client called request.post(..., json=['test', 'string'])
, which is considered just one prediction request, thus in the service API function, the input received will be raw=[ ['test', 'string'] ]
. If you were to run this client concurrently and send lots of requests like this at the same time, the API function on the server side will receive inputs that look like raw=[ ['test', 'string'], ['test', 'string'], ['test', 'string'] ... ]
. Note that when an API function is set to batch=True
, it is supposed to take in a list of prediction requests and produce a list of outputs. BentoML will group requests on the server-side for the API function, and split its output into multiple HTTP responses for each client.
@parano I see, I thought batching would work something like this: If you have 2 requests with [a, b] and [c], batching would pass [a, b, c] to the service and then return the results to the two requests as [processed_a, processed_b], [processed_c].
This would make it easier to use no? Just a simple change of processing inputs and splitting outputs
It took me a while to understand why my code wasn't working since I tried calling the api from python (instead of running server) as service.api([input_list]) and this does work with the batching set to true.
@yuhongsun96 Thanks for your detailed description. I think document of this behavior should be more clear. I will track this issue and improve the situation in the new release.
If my described functionality is good by you, I'm down to implement it.
As I see it, making this change just makes the usage more intuitive with no real downside right? Please let me know your thoughts!
@yuhongsun96 no need to work on the implementation yet, the challenge is more on the design of the API. We thought about this proposed approach, the challenge with it is the users' API client will need to specify if a request is a "single item" or a "list of items", and may requires setting an additional flag in the HTTP header, which makes it even hard to use for new comers. We actually have addressed this issue in the upcoming new version with a very different approach and simplified API IO adapter's design quite a lot. I will share with you the new beta as discussed and love to hear your feedback!
Hi @yuhongsun96, thank you again for the feedback, here's an update for this issue in BentoML version 1.0:
In 1.0, we separated the API endpoint definition and the dynamic batching behavior. Service API is only responsible for describing the mapping between incoming HTTP request and out coming HTTP response type and converting them to/from a python object. Batching logic is limited to the Runner level, where we recommend users to put IO or Compute intense code. More details can be found in the 1.0 guide: http://docs.bentoml.org
Describe the bug When passing a JsonInput as service.api(input), input is passed to the api function as correct type. When passing input through requests library or GUI, it's left as the string of the json and not parsed before passing into the function (called api in this case).
I'd be happy to fix this for you guys if confirmed that this is a bug that you'd like fixed!
To Reproduce Steps to reproduce the issue:
Expected behavior Calling the same API through function directly or through server should have same result.
Screenshots/Logs
Environment: