Supporting model reloads (when a new version is available) and multiple models.
Motivation
Other servers supports this so to be more attractive that would be a nice feature.
Pitch
Right now it's obvious on how to serve one model, but what if there are multiple ones (and the request (binary, or HTTP arguments) will tell which model should be used).
Alternatives
Run N instances for the N models present at a certain time, but if a new model appear, that won't work.
Additional context
We have an internal C++ server that supports this, torch.serve support that too with I believe what they call an orchestrator.
🚀 Feature
Supporting model reloads (when a new version is available) and multiple models.
Motivation
Other servers supports this so to be more attractive that would be a nice feature.
Pitch
Right now it's obvious on how to serve one model, but what if there are multiple ones (and the request (binary, or HTTP arguments) will tell which model should be used).
Alternatives
Run N instances for the N models present at a certain time, but if a new model appear, that won't work.
Additional context
We have an internal C++ server that supports this, torch.serve support that too with I believe what they call an orchestrator.