Open saeid93 opened 1 year ago
Hey @saeid93 ,
That's a great point.
We're currently looking into ways to introduce optimisers within the Seldon stack. It's not 100% clear yet though whether this makes sense at the inference server-level or whether it's something that should happen upstream (e.g. within the orchestrator - like Seldon Core).
BTW regarding Optimum, this should be already part of the HF runtime :)
Hi @adriangonz ,
I'm glad to hear that's something on the agenda. Personally, I think it should be part of the model servers and upsteam frameworks be responsible for high-level tasks like routing. However, I'm very interested to see how this decision will be made for Seldon/MLServer in the future. Just saw the Optimum commit 😁
Feature request - Since doing an optimization step before the deep learning model is becoming very common in machine learning deployment, out-of-the-box support in MLServer could be beneficial, some examples are TVM. This could be added as a config option in the MLServer model config file. An easy starting point would be to add support of Optimum to the HuggingFace runtime and in case of positive feedback gradually add it as a general feature.