Rewrite the NLU Training State Machine in Studio

Description

should fix:

DEV-2258
DEV-2259
DEV-2260

Bugs DEV-2258 and DEV-2259 were hard to reproduce and I couldn't figure out why they were occurring, so I decided to wipe out the code with a bulldozer. You can say this PR is an attempt to:

fix some bugs
make the code simpler
take some ownership in the studio by writing code I actually want to work with

Reviewing

There is little to no value in reviewing the code in Github. I strongly suggest reading the code in vscode instead.

The following files contain the core logic of this PR (state-machine)

packages/studio-be/src/studio/nlu/bot/index.ts: core logic of the training state-machine
packages/studio-be/src/studio/nlu/bot/bot-state.ts

The following files are worth a quick look (entry points)

packages/studio-be/src/studio/nlu/index.ts
packages/studio-be/src/studio/nlu/nlu-router.ts: HTTP API of the NLU in Studio Backend
packages/studio-be/src/studio/nlu/nlu-service.ts: entry point of the business logic

How it works

start training

When training starts, studio-be keeps a training entry in its local DB. A training entry allows to map a botId and language to a modelId and definition hash:

Once training starts, studio-be polls the training state and sends it through the web socket. The polled function is syncAndGetState(). This function is the exact same one called when studio-ui gets the training/model state. Studio-be stops polling when training stops.

get training/model state

When studio-ui gets the training/model state (syncAndGetState()), studio-be starts by checking if it has a training-entry in its local DB.
If there is a local training entry, studio-be fetches nlu-server to get the actual state of the training. The following rule is then used to map the status before returning:

if NLU Server responds that training is "done", the train entry is deleted, a model entry is set/upserted and the bot config is updated with the model. This is why the function is called "syncAndGetState()" instead of only "getState()"
If there's no local training entry, there's no way fetching NLU Server for training state because the modelId is unknown. In this case, studio-be falls back on the model.
Studio-be returns "done" if there is a local model entry and the model exists on NLU Server and the model is not dirty
Else it return "needs-training"

Worth mentioning

Model entries could be kept in bot.config.json instead of in the database, but this mean studio would write dataset hash in the config (which might look weird)
If Studio ever becomes a desktop app that can't be used in a cluster configuration, train entries will be kept in memory instead of in database. The only drawback, is that if studio dies during a training, the training is lost (which is allright).

botpress / studio

chore(nlu): full rewrite of nlu training state machine #255