amplab / velox-modelserver

http://amplab.github.io/velox-modelserver/
Apache License 2.0
110 stars 26 forks source link

Scheduling for offline training #37

Closed dcrankshaw closed 9 years ago

dcrankshaw commented 9 years ago

Offline model training in Spark is triggered by a call to the retrain REST API endpoint, relying on an external system to decide when retraining needs to happen. This functionality should be moved into Velox, so that when a Velox model is deployed the programmer can tell Velox to automatically retrain the model.

For now, scheduling will be periodic. This should be as simple as adding a thread that sleeps for the appropriate amount of time and then calls model.retrain(). One question is where this thread lives. A simple solution is to have each model have a "retrain master" that is responsible for tracking when the feature models need to be retrained. When we start to do quality-based model retraining, this can be turned into an election process.

dcrankshaw commented 9 years ago

58 fixes this.