Closed frankhaugen closed 3 years ago
Yes. Some trainers allow continuous training. That means that you can train, stop, save the model, then start training using the trained weights and do this in a loop until desired result is achieved.
While we are lacking documentation saying which trainers support continuous training, there are samples in the form of unit tests for continued training covering: Averaged Perceptron, Field Aware Factorization Machine, Linear SVM, Logistic Regression, Multiclass Logistic Regression, Online Gradient Descent, Poisson Regression, and SymSGD.
It's likely additional trainers also support continued/continuous training; you can look for trainers with an overload for their .Fit()
which takes in model parameters, for example Averaged Perceptron.
The Averaged Perceptron unit test: https://github.com/dotnet/machinelearning/blob/36fab9b6806260e64e50992450a219e869c7f74a/test/Microsoft.ML.Functional.Tests/Training.cs#L80-L118
This topic is also called incremental learning, or online learning depending on the use case of resuming the training.
We do have this doc describing what @justinormont has explained:
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/retrain-model-ml-net
I was unsure if I should ask here or on Stack Overflow.
(SO have less than 400 questions with the
ml.net
-tag, so I doubt there is a critical mass of people who bother to to follow the tag)TL;DR:
Scenario
Where I work, we have an AI/ML/DL product called Semine, which does classification of invoices for accounting purposes, e.g. detecting what "accounting code" a specific invoice line is. We had a brilliant Ph.D. in statistics consult with us and write an optimized algorithm for our needs. I'd love to describe it in details but I'm not contractually allowed to divulge trade secrets, but in general: When an invoice is "posted", the relevant values are added to the pile of data which is used by the algorithm. Then there is an "incremental learning" on that action, and not a complete re-train of the entire model; Having to retrain an entire model a few thousand times per day would not be financially responsible.
Question
Is there a way to do this type on learning in ML.net? Just a tweak, based on a small change to the underlying data?
#AskingForAFriend
😆My efforts
Having googled, (even with Bing 😆 ), it's evident that there are a lot of questions about this, but no clear answers or examples. So a definitive "yes/no" on the question of if it is possible, and if it will be possible
Thank you for your time!