cognoma / machine-learning

Machine learning for Project Cognoma
Other
32 stars 47 forks source link

Using incremental(on-line) learning to reduce memory cost #24

Closed yl565 closed 8 years ago

yl565 commented 8 years ago

Since there might be many people using the website simultaneously, it would be good to adopt incremental(on-line) learning algorithms as much as possible to train classifier. In incremental(on-line) learning, classifiers are not trained using the whole dataset, the samples are feed into the learning algorithm one-by-one (or by small batches). Here is the reference: http://scikit-learn.org/stable/modules/scaling_strategies.html

dhimmel commented 8 years ago

I didn't know sklearn had a standardized partial_fit method for online learning. Really cool.

However, I don't think we will run out of memory (and should restrict ourselves to only algorithms that support partial_fit). The task queue can always manage how many concurrent jobs are running to avoid memory overflow. And I believe we can easily spin up new virtual instances to meet our needs in the cloud.

We're not dealing with the quantity of observations that would necessitate online learning. Therefore, I'm closing this issue, especially since we have many unclaimed tasks that are more pressing.