Closed tomerk closed 9 years ago
Question @dcrankshaw: as new observations come in as the bulk retrain is happening, would you rather stop doing any coordinated online updates whatsoever, or still update the active version of the model (so long as we don't forget to apply them to the new version after the bulk retrain)?
To start with, I think stop doing online updates altogether and just queue the new observations until the bulk-retrain is done.
I think the right answer is actually dependent on how long we expect the bulk retrain to take. E.g. if bulk retrain is going to take an hour, we should probably keep applying online updates, but if it will take ~2 minutes then we are probably better off waiting.
Fixed by #58
I think it's clear that we need some sort of queue for new observations. Along with not dropping observations during bulk retrain, we may want to do some coordinated scheduling of online updates to improve performance (see #38).