botpress / nlu

This repo contains every ML/NLU related code written by Botpress in the NodeJS environment. This includes the Botpress Standalone NLU Server.
22 stars 21 forks source link

svm training improvement 2 #89

Closed franklevasseur closed 2 years ago

franklevasseur commented 2 years ago

About

This PR is part of a sequence of PR's with name svm training improvement $n that presents few improvements or combination of improvements as attempts to make training faster and consume less memory.

⚠️⚠️ Do not merge this PR as we first need to compare with other attempts first. ⚠️⚠️

Description

This PR mixes #88 with the following improvement:

We train one intent classifier per context. Currently all those classifiers are trained sequentially. However, there is no arm in training them concurrently as the amount of concurrent trainings is limited by the MLThreadPool class located in ml-thread-pool/index.ts file:

const maxMLThreads = Math.max(os.cpus().length - 1, 1) // ncpus - webworker
const userMlThread = process.env.BP_NUM_ML_THREADS ? Number(process.env.BP_NUM_ML_THREADS) : 4
const numMLThreads = Math.min(maxMLThreads, userMlThread)

Performance

On clinc150 using local lang server with dimension 100:

branch memory used (mb) time to train (s)
master ~800 101
this ~700 240

On John Doe* using remote lang server https://lang-01.botpress.io

branch memory used (gb) time to train (min)
master ~40 20
this <2 38

Overall, it's fair to say that there is no major speed improvement from #88.