SeldonIO / MLServer

An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more
https://mlserver.readthedocs.io/en/latest/
Apache License 2.0
708 stars 180 forks source link

Out of the box support of graph optimiser #1024

Open saeid93 opened 1 year ago

saeid93 commented 1 year ago

Feature request - Since doing an optimization step before the deep learning model is becoming very common in machine learning deployment, out-of-the-box support in MLServer could be beneficial, some examples are TVM. This could be added as a config option in the MLServer model config file. An easy starting point would be to add support of Optimum to the HuggingFace runtime and in case of positive feedback gradually add it as a general feature.

adriangonz commented 1 year ago

Hey @saeid93 ,

That's a great point.

We're currently looking into ways to introduce optimisers within the Seldon stack. It's not 100% clear yet though whether this makes sense at the inference server-level or whether it's something that should happen upstream (e.g. within the orchestrator - like Seldon Core).

BTW regarding Optimum, this should be already part of the HF runtime :)

saeid93 commented 1 year ago

Hi @adriangonz ,

I'm glad to hear that's something on the agenda. Personally, I think it should be part of the model servers and upsteam frameworks be responsible for high-level tasks like routing. However, I'm very interested to see how this decision will be made for Seldon/MLServer in the future. Just saw the Optimum commit 😁