Open liefra opened 1 year ago
Hello @liefra, Thanks for taking the time to raise the issue and sorry for the long response delay.
The NLU server should be able to deal with your consequent training data but note that it's considered "huge".
In general, it's best if you're able to have a "uniform" set which means :
Regarding the warning: WARNING: reaching max number of iterations
it comes from the library we're using to train the chatbot called LibSVM.
The warning tells you that with all these data, for some intent, the classifying algorithm wasn't able to converge which does not make the NLU server break but may significantly impact the accuracy of the bot's responses.
For the time rising, it's normal as you seem to have a lot of utterances per intent. We do have users with ~100 intents that are able to train in ~20mn but they have only dozens of utterances per intent.
I would advise you to split your big intents into smaller intents.
Like play_sport
=> play_tennis
+ play_volleyball
Also if you can find a way to reduce the logic (200 intents seems a lot) by regrouping some intents and having switches with slots/entities/buttons it could help.
Example : register
+ unregister
=> registration
+ button in the chatbot with 2 options (register or unregister).
I hope that helps, Have a great day
Make sure the issue is NLU related
Operating system
macOS
Product used
NLU Server
Deploy Option
Binary
Version
nlu-v1_0_1-darwin-x64
Configuration File
CLI Arguments
Environment variables
Description of the bug
Hi,
We are using out the NLU server in standalone mode using a training set which consists of around 10 custom entities and 60 intents. 1 of these entity has many entries (~900)
When we run the training we see the following warning printed around 60 times:
WARNING: reaching max number of iterations
Not sure if this is the expected behavior or not? Does this affect the training time performance? It seems that the training time rises significantly the more intents are included in the training set. We tried a training set containing around 200 intents and the training took around 2 hours. Is there a way to improve the training time and ideally to keep it below 30min?
If this is normal, then please discard this report and apologies. But I couldn't find anything in the existing issues yet.
BTW, I tested on my local Mac (8CPU/32GB/Intel) and also on a Centos7 server with 12CPU/32GB and get similar results.
Thanks a lot
Here is an excerpt of the log file