Xilinx / inference-server

https://xilinx.github.io/inference-server/
Apache License 2.0
43 stars 13 forks source link

Change memory buffering for requests #116

Closed varunsh-xilinx closed 1 year ago

varunsh-xilinx commented 1 year ago

Currently, each worker allocates its own sets of buffers which can consume more memory than needed since they can't be shared. It also doesn't do well with dynamically sized input data. A new system, perhaps a global memory manager, can serve as a way to get memory as needed so having more workers doesn't immediately consume lots of memory.

Another option to consider is using hardware based buffers, if available but this needs more investigation first.