Closed cfregly closed 7 years ago
related to https://github.com/fluxcapacitor/pipeline/issues/61 for alerts/notifications of bad canaries
Notes:
online evaluation is pinning 1 model (or multi-armed bandit) against another vs offline
online is best, offline is preliminary smoke test. requires online system which is not common. we hope to make it common.
form of canary analysis of new model against existing cluster.
some traffic goes to new model canary.
make sure new model is performing within acceptable tolerance of existing model, otherwise remove it and try again.
simple monitoring: golden set of data scored live against newly-deployed model
complex: deploy canary of new model alongside cluster of old model.
compare as part of canary analysis. keep an eye out for Netflix ACA (automated Canary Analysis)
zuul screenshots:
Moving to Advanced Edition http://pipeline.ai/products
provide metrics on both system and prediction performance to allow multiple levels of canary analysis
for ML model canary analysis, we'll want to compare
we want difference between currentPrediction and newPrediction to be close to 0 or within some tolerance