Description

Various attempts at improving training resource consumption in 4 commits:

fix(nlu-engine): launch svm trainings one after the other

Currently, all iterations of the grid search ran during the SVM training are done concurrently. This results in multiple SVM being loaded at the same time in memory, thus consuming lots of RAM. Using a Bluebird mapSeries this PR makes sure, only one SVM is loaded at any time during the grid search.

Unfortunately, because the node-svm binding uses a NAPI:: AsyncWorker, this fixes also reduces training speed.

fix(nlu-engine): launch intent trainings in parallel and log each ctx

We train one intent classifier per context. Currently all those classifiers are trained sequentially. However, there is no arm in training them concurrently as the amount of concurrent trainings is limited by the MLThreadPool class located in ml-thread-pool/index.ts file:

const maxMLThreads = Math.max(os.cpus().length - 1, 1) // ncpus - webworker
const userMlThread = process.env.BP_NUM_ML_THREADS ? Number(process.env.BP_NUM_ML_THREADS) : 4
const numMLThreads = Math.min(maxMLThreads, userMlThread)

fix(nlu-engine): use a stratified kfold to limit the amount of grid search iterations

We currently use a random kfold algorithm to make a grid search and try out hyper parameters of the svm for better performances.

The Random algorithm is fine, but can't result in critical errors when a train set is made of only one class. This scenario will fully break libsvm and there won't be any clean error thrown. As I knew this could happend, I added a min value for k that ensures no train set would ever be made of only one class. This min value can however be quite big when a dataset is really imbalanced. This results in a really long and painfull grid-search.

In this PR I added a Stratified version of the Kfold algorithm which ensures class proportions are preserved in each folds as much as possible.

Checkout <root>/packages/nlu-engine/src/ml/svm/libsvm/kfold/readme.md to learn more about this.

I also took the opportunity to cleanup the kfold folder/library to make sure my futur self will understand this part of the code. It took me a whole day to just get back in the problem and understand what I previsouly understood one year prior. While I'm at it I decided to make it super clear with a proper documentation. Even though it's simple, it can result in a lot of seconds or Gb of RAM added to a training.

chore(nlu-engine): go back to running svm grid search in parallel

There's no real need to run grid-search in serial as the amount of folds is highly reduced by the last commit.

Performances

On clinc150 using local lang server with dimension 100:

branch	memory used (mb)	time to train (s)
master	~800	101
this	~700	82

On John Doe* using remote lang server https://lang-01.botpress.io

branch	memory used (gb)	time to train (min)
master	~40	20
this	~1.8	5

John Doe is an internally-known really big bot

botpress / nlu