RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.98k stars 4.64k forks source link

Multiple entities of the same kind #1034

Closed rithwikjc closed 6 years ago

rithwikjc commented 6 years ago
**Rasa NLU version**: 0.11.3 **Operating system** : Windows 10 **Content of model configuration file**: I have used the following pipeline. `ner_crf` and `ner_duckling` are used for entity extraction. ```yml "pipeline": ["nlp_spacy", "tokenizer_spacy", "intent_entity_featurizer_regex", "intent_featurizer_spacy", "ner_crf", "ner_synonyms", "ner_duckling", "intent_classifier_sklearn"] ``` **Issue**: Hello everyone. I have trained a model to detect **activities** as entities. And I have included the duckling extractor for extracting structured **time** values. But when I parse a sentence like `I want to go running in the morning` `ner_crf` does give me *'go running'* as an **activity** and duckling does detect *'in the morning'* as a **time** entity. - But `ner_crf` is doing something I didn't train it for and is detecting *'the morning'* as an **activity** entity. (What I expected the NLU to give me is *'go running'* as the only **activity** entity.) Based on @tmbo 's answer in #427 I can see that this is due to how `ner_crf` works. And trying some more examples I found that it is misclassifying words with suffix *'ing'* as activities (maybe caused by the training data). **So my questions are:** 1. How do I prevent this misclassification? (It's blatantly wrong to understand *morning* as an activity). 2. How would you suggest one could resolve issues where the NLU returns more than one entity of the same kind, if only one is expected, and is to be stored in a slot? (Right now I am not using Rasa Core due to lack of data. But I imagine more people have/also face this issue and could give some pointers.) **Here is a sample of my training data:** ``` ## intent:inform_activity - i want to [learn drama](activity) - i want to [go running](activity) - i would like to [go bodybuilding](activity) everyday - i like to [practice juggling](activity) ``` I have generated and used ~900 such sentences for the same intent and with activity entities.
akelad commented 6 years ago

actually, can you try updating to rasa_nlu==0.11.5 first ? We recently noticed some bugs with ner_crf, so this might also be causing the problem.

rithwikjc commented 6 years ago

@akelad Thanks for the suggestion. I will try that.

Also I have managed to temporarily overcome the problem by adding some more training data like :

- i want to [play basketball](activity) in the morning
- i want to [play basketball](activity) in the evening

But I could still use help with general ways to handle situations where the NLU extracts multiple entities of the same kind, that would be great.

rithwikjc commented 6 years ago

I guess I'll have to figure something out on my own. I will close the issue.

Will pip install rasa_nlu --upgrade be enough to upgrade Rasa?

akelad commented 6 years ago

pip install rasa_nlu --upgrade will probably pull version 0.12.2 which has big configuration changes, so depends whether you want to spend time rewriting that. If you want version 0.11.5, run pip install rasa_nlu==0.11.5

The approach you're going with should work fine, and i think with the upgrade it shouldn't recognize "in the morning" anymore. Try upgrading and then let us know if you're still having those issues

rithwikjc commented 6 years ago

I am planning to upgrade to the latest and try the new tensorflow embedding pipeline. Thanks for the help. Closing the issue now.

AbdurRub commented 6 years ago

What type of bug in ner_crf you guys noticed? @akelad

akelad commented 6 years ago

it was only training on examples with labeled entities and therefore had no negative examples of sentences with no entities. so it was wrongly recognising entities in pretty much all sentences