Xilinx / inference-server

https://xilinx.github.io/inference-server/
Apache License 2.0
43 stars 13 forks source link

Using KServe with ZenDNN #27

Closed varunsh-xilinx closed 2 years ago

varunsh-xilinx commented 2 years ago

Summary of Changes

Motivation

Using the Inference Server with KServe was complicated because KServe just considers requests to be made to models rather than backends. To support a use case similar to KServe, that behavior is now also supported directly in the inference server instead of using special name tricks as before. In addition, directory monitoring for loading new models is needed to support KServe's single model serving case.

Implementation

A definition of a model repository is added to the inference server. It knows how to traverse the assumed directory structure to load the model after a call to modelLoad¹. The server can also be configured to monitor a particular directory and will make models in that directory available automatically. For directory monitoring, I'm using an open-source library².

¹ The previous modelLoad API is now renamed to workerLoad and has no other behavior changes. The new modelLoad API expects artifacts in the standard model format used by other inference servers ² The robustness of this needs to be verified.