botpress / nlu

This repo contains every ML/NLU related code written by Botpress in the NodeJS environment. This includes the Botpress Standalone NLU Server.
22 stars 21 forks source link

WARNING: reaching max number of iterations #217

Open liefra opened 1 year ago

liefra commented 1 year ago

Make sure the issue is NLU related

Operating system

macOS

Product used

NLU Server

Deploy Option

Binary

Version

nlu-v1_0_1-darwin-x64

Configuration File

not used

CLI Arguments

./nlu-v1_0_1-darwin-x64 --verbose=4

Environment variables

not used

Description of the bug

Hi,

We are using out the NLU server in standalone mode using a training set which consists of around 10 custom entities and 60 intents. 1 of these entity has many entries (~900)

When we run the training we see the following warning printed around 60 times: WARNING: reaching max number of iterations

Not sure if this is the expected behavior or not? Does this affect the training time performance? It seems that the training time rises significantly the more intents are included in the training set. We tried a training set containing around 200 intents and the training took around 2 hours. Is there a way to improve the training time and ideally to keep it below 30min?

If this is normal, then please discard this report and apologies. But I couldn't find anything in the existing issues yet.

BTW, I tested on my local Mac (8CPU/32GB/Intel) and also on a Centos7 server with 12CPU/32GB and get similar results.

Thanks a lot

Here is an excerpt of the log file

07/11/2022 11:23:06.039 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:23:06.040 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:23:06.040 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations
07/11/2022 11:23:16.041 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:23:16.041 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:23:16.041 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations
07/11/2022 11:23:26.041 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:23:26.042 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:23:26.042 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations
07/11/2022 11:23:33.056 [NLU] trx-queue Task "_runTask" waiting.
07/11/2022 11:23:33.057 [NLU] trx-queue Task "_runTask" started.
07/11/2022 11:23:33.057 [NLU] trx-queue Task "_runTask" done.
07/11/2022 11:23:36.042 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:23:36.043 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:23:36.043 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations
07/11/2022 11:23:46.043 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:23:46.043 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:23:46.044 [NLU] trx-queue Task "progressCallback" done.
07/11/2022 11:23:56.043 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:23:56.044 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:23:56.044 [NLU] trx-queue Task "progressCallback" done.
07/11/2022 11:24:06.044 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:24:06.044 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:24:06.044 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations
07/11/2022 11:24:16.044 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:24:16.044 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:24:16.045 [NLU] trx-queue Task "progressCallback" done.
07/11/2022 11:24:26.044 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:24:26.045 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:24:26.045 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations
07/11/2022 11:24:33.058 [NLU] trx-queue Task "_runTask" waiting.
07/11/2022 11:24:33.059 [NLU] trx-queue Task "_runTask" started.
07/11/2022 11:24:33.059 [NLU] trx-queue Task "_runTask" done.
07/11/2022 11:24:36.045 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:24:36.045 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:24:36.046 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations
07/11/2022 11:24:46.045 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:24:46.046 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:24:46.046 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations
07/11/2022 11:24:56.046 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:24:56.047 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:24:56.047 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations
07/11/2022 11:25:06.047 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:25:06.047 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:25:06.047 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations
07/11/2022 11:25:16.047 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:25:16.048 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:25:16.048 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations
07/11/2022 11:25:26.048 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:25:26.049 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:25:26.049 [NLU] trx-queue Task "progressCallback" done.
07/11/2022 11:25:33.061 [NLU] trx-queue Task "_runTask" waiting.
07/11/2022 11:25:33.061 [NLU] trx-queue Task "_runTask" started.
07/11/2022 11:25:33.062 [NLU] trx-queue Task "_runTask" done.
07/11/2022 11:25:36.049 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:25:36.050 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:25:36.050 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations
07/11/2022 11:25:46.050 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:25:46.050 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:25:46.050 [NLU] trx-queue Task "progressCallback" done.
07/11/2022 11:25:56.051 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:25:56.051 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:25:56.052 [NLU] trx-queue Task "progressCallback" done.
07/11/2022 11:26:06.052 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:26:06.052 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:26:06.052 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations
07/11/2022 11:26:16.052 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:26:16.053 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:26:16.053 [NLU] trx-queue Task "progressCallback" done.
07/11/2022 11:26:26.053 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:26:26.053 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:26:26.053 [NLU] trx-queue Task "progressCallback" done.

WARNING: reaching max number of iterations
07/11/2022 11:26:30.456 [NLU] Engine:training [cds/d8b1bffae81508b0.b7f4a95061d75566.3040.en] Done TrainSlotTaggers
07/11/2022 11:26:31.054 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:26:31.054 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:26:31.054 [NLU] trx-queue Task "progressCallback" done.
07/11/2022 11:26:33.063 [NLU] trx-queue Task "_runTask" waiting.
07/11/2022 11:26:33.063 [NLU] trx-queue Task "_runTask" started.
07/11/2022 11:26:33.064 [NLU] trx-queue Task "_runTask" done.

WARNING: reaching max number of iterations
07/11/2022 11:26:40.753 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:26:40.754 [NLU] trx-queue Task "progressCallback" started.
07/11/2022 11:26:40.754 [NLU] trx-queue Task "progressCallback" done.
07/11/2022 11:26:44.931 [NLU] Engine:training About to start task on worker 5
07/11/2022 11:26:44.999 [NLU] Engine:training About to start task on worker 8
07/11/2022 11:26:46.679 [NLU] Engine:training About to start task on worker 6

WARNING: reaching max number of iterations

WARNING: reaching max number of iterations
07/11/2022 11:26:54.842 [NLU] trx-queue Task "progressCallback" waiting.
07/11/2022 11:26:54.842 [NLU] trx-queue Task "progressCallback" started.
ierezell commented 1 year ago

Hello @liefra, Thanks for taking the time to raise the issue and sorry for the long response delay.

The NLU server should be able to deal with your consequent training data but note that it's considered "huge".

In general, it's best if you're able to have a "uniform" set which means :

Regarding the warning: WARNING: reaching max number of iterations it comes from the library we're using to train the chatbot called LibSVM. The warning tells you that with all these data, for some intent, the classifying algorithm wasn't able to converge which does not make the NLU server break but may significantly impact the accuracy of the bot's responses.

For the time rising, it's normal as you seem to have a lot of utterances per intent. We do have users with ~100 intents that are able to train in ~20mn but they have only dozens of utterances per intent.

I would advise you to split your big intents into smaller intents. Like play_sport => play_tennis + play_volleyball

Also if you can find a way to reduce the logic (200 intents seems a lot) by regrouping some intents and having switches with slots/entities/buttons it could help. Example : register + unregister => registration + button in the chatbot with 2 options (register or unregister).

I hope that helps, Have a great day