Summary of Changes

Enable using the Inference Server with ZenDNN on KServe
Add directory monitoring for dynamic model hosting

Motivation

Using the Inference Server with KServe was complicated because KServe just considers requests to be made to models rather than backends. To support a use case similar to KServe, that behavior is now also supported directly in the inference server instead of using special name tricks as before. In addition, directory monitoring for loading new models is needed to support KServe's single model serving case.

Implementation

A definition of a model repository is added to the inference server. It knows how to traverse the assumed directory structure to load the model after a call to modelLoad¹. The server can also be configured to monitor a particular directory and will make models in that directory available automatically. For directory monitoring, I'm using an open-source library².

¹ The previous modelLoad API is now renamed to workerLoad and has no other behavior changes. The new modelLoad API expects artifacts in the standard model format used by other inference servers ² The robustness of this needs to be verified.

Xilinx / inference-server

Using KServe with ZenDNN #27

Summary of Changes

Motivation

Implementation