botpress / studio

The studio is the main interface you'll use to build and edit your chatbot.
https://botpress.com/docs/quickstart#conversation-studio
38 stars 38 forks source link

chore(nlu): full rewrite of nlu training state machine #255

Closed franklevasseur closed 2 years ago

franklevasseur commented 2 years ago

Rewrite the NLU Training State Machine in Studio

Description

should fix:

Bugs DEV-2258 and DEV-2259 were hard to reproduce and I couldn't figure out why they were occurring, so I decided to wipe out the code with a bulldozer. You can say this PR is an attempt to:

Reviewing

There is little to no value in reviewing the code in Github. I strongly suggest reading the code in vscode instead.

The following files contain the core logic of this PR (state-machine)

The following files are worth a quick look (entry points)

How it works

start training

  1. When training starts, studio-be keeps a training entry in its local DB. A training entry allows to map a botId and language to a modelId and definition hash:

image

  1. Once training starts, studio-be polls the training state and sends it through the web socket. The polled function is syncAndGetState(). This function is the exact same one called when studio-ui gets the training/model state. Studio-be stops polling when training stops.

get training/model state

  1. When studio-ui gets the training/model state (syncAndGetState()), studio-be starts by checking if it has a training-entry in its local DB.

  2. If there is a local training entry, studio-be fetches nlu-server to get the actual state of the training. The following rule is then used to map the status before returning: image

    if NLU Server responds that training is "done", the train entry is deleted, a model entry is set/upserted and the bot config is updated with the model. This is why the function is called "syncAndGetState()" instead of only "getState()"

  3. If there's no local training entry, there's no way fetching NLU Server for training state because the modelId is unknown. In this case, studio-be falls back on the model.

  4. Studio-be returns "done" if there is a local model entry and the model exists on NLU Server and the model is not dirty

  5. Else it return "needs-training"

Worth mentioning

  1. Model entries could be kept in bot.config.json instead of in the database, but this mean studio would write dataset hash in the config (which might look weird)
  2. If Studio ever becomes a desktop app that can't be used in a cluster configuration, train entries will be kept in memory instead of in database. The only drawback, is that if studio dies during a training, the training is lost (which is allright).
franklevasseur commented 2 years ago

I approve my PR