Open KineticCookie opened 4 years ago
I'll throw in a little bit more context to let this be a good-first-issue.
Hydro-serving is able to shadow data between multiple model variants in a serving application.
i.e. A 5% canary test can look like this
Application ‘A’
|
| - Variant 1: model ‘a’ version 1. weight=95
| - Variant 2: model ‘a’ version 2. weight=5
How shadowing is done:
Thus, we shadow incoming data to all model variants but return output only from a single one.
Since we wait for all model variants to finish output calculation we are left with incorrect latency which is a maximum latency of all model variants.
To improve throughput and calculate latency properly per each model variant we need to stop waiting for all model variants to produce their outputs and choose the model which output will be returned before outputs are calculated.
Improve A/B execution by defining return value BEFORE execution happens.