Xilinx / inference-server

https://xilinx.github.io/inference-server/
Apache License 2.0
43 stars 13 forks source link

Use memory manager for workers #148

Closed varunsh-xilinx closed 1 year ago

varunsh-xilinx commented 1 year ago

Currently, workers are responsible for allocating their own memory that they contribute to a pool for their worker group. There are a few issues around this:

  1. Doesn't work well for non-constant input sizes
  2. Potentially a lot of extra memory is kept around
  3. Can't easily use hardware-pinned memory

One solution is a central memory manager.

Goals:

  1. Allow for dynamic batching (#131) by allowing the batcher to return unused memory
  2. Allow for hardware backed memory but also CPU based memory
  3. Allow for chaining
  4. Allocate more memory if needed