botpress / nlu

This repo contains every ML/NLU related code written by Botpress in the NodeJS environment. This includes the Botpress Standalone NLU Server.
23 stars 21 forks source link

svm training improvement 1 #88

Closed franklevasseur closed 3 years ago

franklevasseur commented 3 years ago

Solves #70

About

This PR is part of a sequence of PR's with name svm training improvement $n that presents few improvements or combination of improvements as attempts to make training faster and consume less memory.

⚠️⚠️ Do not merge this PR as we first need to compare with other attempts first. ⚠️⚠️

Description

Currently, all iterations of the grid search ran during the SVM training are done concurrently. This results in multiple SVM being loaded at the same time in memory, thus consuming lots of RAM. Using a Bluebird mapSeries this PR makes sure, only one SVM is loaded at any time during the grid search.

Unfortunately, because the node-svm binding uses a NAPI:: AsyncWorker, this fixes also reduces training speed.

Performance

On clinc150 using local lang server with dimension 100:

branch memory used (mb) time to train (s)
master ~800 101
this ~700 250

On John Doe* using remote lang server https://lang-01.botpress.io

branch memory used (gb) time to train (min)
master ~40 20
this < 2 38