deepjavalibrary / djl-serving

A universal scalable machine learning model deployment solution
Apache License 2.0
183 stars 58 forks source link

[python] refactor input parser to support Request #2145

Closed sindhuvahinis closed 1 week ago

sindhuvahinis commented 1 week ago

Description

Refactoring input parser to support Request. Before this PR, we loop through the list of requests to list of input_text, input_size, parameters etc.. and then in rolling batch, we loop through through these parsed list and then convert to list of requests Request again. This PR aims to avoid this duplicate work.

Assumptions

  1. Before this PR, we duplicated the parameters for client side batching, which is not needed anymore as we no longer maintain this as list, we maintan this a list of Requests.
  2. In this PR, we assume, if adapter_registry is non empty, then adapters needs to be looked in the requests. If adapter_registry is empty, then we dont look for adapters.
  3. In this PR, we introduced server_parameters => which will have the server modified parameters and the built-in handlers should look for this in order to modify or send to backend engines like vllm.

After this PR, for future improvements

  1. For the new standard of input_formatter, we would want request_input and part of the input_formatter. This refactor makes this easier.
  2. For mulitmodal parsing, this should also makes thing easier.
  3. output_formatter could be easily used by dynamic batching use cases as well. This will unify the API UX for rolling batch and dynamic batching.

Testing

P.S. I could not divide this into multiple PRs, sorry about that. All these changes has to go in one PR.