Enable using the Inference Server with ZenDNN on KServe
Add directory monitoring for dynamic model hosting
Motivation
Using the Inference Server with KServe was complicated because KServe just considers requests to be made to models rather than backends. To support a use case similar to KServe, that behavior is now also supported directly in the inference server instead of using special name tricks as before. In addition, directory monitoring for loading new models is needed to support KServe's single model serving case.
Implementation
A definition of a model repository is added to the inference server. It knows how to traverse the assumed directory structure to load the model after a call to modelLoad¹. The server can also be configured to monitor a particular directory and will make models in that directory available automatically. For directory monitoring, I'm using an open-source library².
¹ The previous modelLoad API is now renamed to workerLoad and has no other behavior changes. The new modelLoad API expects artifacts in the standard model format used by other inference servers
² The robustness of this needs to be verified.
Summary of Changes
Motivation
Using the Inference Server with KServe was complicated because KServe just considers requests to be made to models rather than backends. To support a use case similar to KServe, that behavior is now also supported directly in the inference server instead of using special name tricks as before. In addition, directory monitoring for loading new models is needed to support KServe's single model serving case.
Implementation
A definition of a model repository is added to the inference server. It knows how to traverse the assumed directory structure to load the model after a call to
modelLoad
¹. The server can also be configured to monitor a particular directory and will make models in that directory available automatically. For directory monitoring, I'm using an open-source library².¹ The previous
modelLoad
API is now renamed toworkerLoad
and has no other behavior changes. The newmodelLoad
API expects artifacts in the standard model format used by other inference servers ² The robustness of this needs to be verified.