Xilinx / inference-server

https://xilinx.github.io/inference-server/
Apache License 2.0
43 stars 13 forks source link

On dynamic batching #160

Closed varunsh-xilinx closed 1 year ago

varunsh-xilinx commented 1 year ago

Originally posted by varunsh-xilinx September 13, 2022

  1. If there are two workers with different batch sizes, can the server dynamically send requests to the right one based on some criteria.
  2. Can a single worker accept a range of batch sizes?
  3. The batcher's timeout is statically configured currently. If the worker is busy for example, the batcher may as well hold on to the batch and try to put more together to improve throughput.