Closed yl565 closed 8 years ago
I didn't know sklearn had a standardized partial_fit
method for online learning. Really cool.
However, I don't think we will run out of memory (and should restrict ourselves to only algorithms that support partial_fit
). The task queue can always manage how many concurrent jobs are running to avoid memory overflow. And I believe we can easily spin up new virtual instances to meet our needs in the cloud.
We're not dealing with the quantity of observations that would necessitate online learning. Therefore, I'm closing this issue, especially since we have many unclaimed tasks that are more pressing.
Since there might be many people using the website simultaneously, it would be good to adopt incremental(on-line) learning algorithms as much as possible to train classifier. In incremental(on-line) learning, classifiers are not trained using the whole dataset, the samples are feed into the learning algorithm one-by-one (or by small batches). Here is the reference: http://scikit-learn.org/stable/modules/scaling_strategies.html