dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.04k stars 1.88k forks source link

Is it possible to do continuous/incremental learning in ML.net? [question] #5504

Closed frankhaugen closed 3 years ago

frankhaugen commented 3 years ago

I was unsure if I should ask here or on Stack Overflow.
(SO have less than 400 questions with the ml.net-tag, so I doubt there is a critical mass of people who bother to to follow the tag)

TL;DR:

Is it possible to do small incremental changes to a trained model?

Scenario
Where I work, we have an AI/ML/DL product called Semine, which does classification of invoices for accounting purposes, e.g. detecting what "accounting code" a specific invoice line is. We had a brilliant Ph.D. in statistics consult with us and write an optimized algorithm for our needs. I'd love to describe it in details but I'm not contractually allowed to divulge trade secrets, but in general: When an invoice is "posted", the relevant values are added to the pile of data which is used by the algorithm. Then there is an "incremental learning" on that action, and not a complete re-train of the entire model; Having to retrain an entire model a few thousand times per day would not be financially responsible.

Question
Is there a way to do this type on learning in ML.net? Just a tweak, based on a small change to the underlying data? #AskingForAFriend 😆

My efforts
Having googled, (even with Bing 😆 ), it's evident that there are a lot of questions about this, but no clear answers or examples. So a definitive "yes/no" on the question of if it is possible, and if it will be possible

Thank you for your time!

justinormont commented 3 years ago

Yes. Some trainers allow continuous training. That means that you can train, stop, save the model, then start training using the trained weights and do this in a loop until desired result is achieved.

While we are lacking documentation saying which trainers support continuous training, there are samples in the form of unit tests for continued training covering: Averaged Perceptron, Field Aware Factorization Machine, Linear SVM, Logistic Regression, Multiclass Logistic Regression, Online Gradient Descent, Poisson Regression, and SymSGD.

It's likely additional trainers also support continued/continuous training; you can look for trainers with an overload for their .Fit() which takes in model parameters, for example Averaged Perceptron.

The Averaged Perceptron unit test: https://github.com/dotnet/machinelearning/blob/36fab9b6806260e64e50992450a219e869c7f74a/test/Microsoft.ML.Functional.Tests/Training.cs#L80-L118

This topic is also called incremental learning, or online learning depending on the use case of resuming the training.

justinormont commented 3 years ago

Similar questions:

antoniovs1029 commented 3 years ago

We do have this doc describing what @justinormont has explained:

https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/retrain-model-ml-net