RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.94k stars 4.64k forks source link

What factor need to consider while making training dataset? #2001

Closed artiberde27 closed 5 years ago

artiberde27 commented 6 years ago

Rasa Core version:0.8.3

Python version: 3.5

Issue: I am working on small banking chatbot demo.I don't have any idea about machine learning.As mention in rasa core, I made my training dataset for a demo. In the first run for each intent I have specified 3 example and able to run application properly.After that, I tried to increase my training dataset and randomly added examples but that time it was not properly responded.For one intent I have added 3 example and second intent I have added 7 example than it was not able to identified intent properly.

Suppose I have 2 intent request_loan (3 example in training dataset) person_category_info (7 example in training dataset)

And I typed "I want to apply for a loan" it was identified as "person_category_info"(for this have 7 example in training dataset) intent which was wrong.The right intent should be "request_loan"(for this have 3 example in training dataset).

Is a number of example for each intent should be equal? On which bases score is assigned to each intent?

Also when I changed --epochs count from 100 to 300 it was not responded properly.On 100 it was responding properly. On what bases below parameters should be defined while making training dataset and stories? max_history=3, epochs=100, batch_size=50, augmentation_factor=50, validation_split=0.2

akelad commented 6 years ago

So for NLU examples, I'd recommend you have around 100 examples for each to produce consistent results. 3 and 7 examples are too few to distinguish between intents.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 6 years ago

This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.