RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.94k stars 4.64k forks source link

Entity recognition for location #1272

Closed Sramperu closed 6 years ago

Sramperu commented 6 years ago
**Rasa NLU version**: Name: rasa-nlu Version: 0.12.3 **RASA CORE version**: Name: rasa-core Version: 0.9.7 **Operating system** (windows, osx, ...): RHEL 7.4 **Content of model configuration file**: language: "en" pipeline: - name: "nlp_spacy" - name: "ner_spacy" - name: "tokenizer_spacy" - name: "intent_entity_featurizer_regex" - name: "intent_featurizer_spacy" - name: "ner_crf" - name: "ner_synonyms" - name: "intent_classifier_sklearn" - name: "intent_featurizer_count_vectors" - name: "intent_classifier_tensorflow_embedding" intent_tokenization_flag: true intent_split_symbol: "+" ``` First of all, I have searched for this issue prior and through gitter also. Am raising as an issue here since I couldnt figure it out any where. I have taken the weather bot demo & trying to get weather for a specified location. It works absolutely fine when I give the location upfront: "Get me the weather in Sydney". However, when I say "Get me the weather" & the bot asks "Enter the location". The location I entered is not recognized without preposition..... entity is recognized only when I give location as "in Sydney". Please let me know whether this is issue with RASA Core config.
twhughes commented 6 years ago

Hi

This is something we are working on improving for future releases. Right now the model has no way of knowing that 'Sydney' alone is a city name unless it is defined in the training set as such, or it's been trained on several similar examples.

Although the current model looks at features like whether the word was capitalized, for instance, it's likely that it has learned that locations often come after prepositions like 'in ' or 'going to ' and may be overfitting to this.

In future releases we will include internal or user-specified lookup tables so that the model will be much better at picking out city names, like 'Sydney', and label them as location entities. We'll also include some tools for creating character ngram features that may help extract custom entities. For now I'd assume it's nothing you did wrong. You could try adding a single Sydney training example labelled as a location entity and see if it works.

Hope it helps

Sramperu commented 6 years ago

@twhughes ... Thanks for the reply..... here, Sydney is infact present in the training data but once again when data is trained, I give it as "get me weather in sydney" right? so even if I had trained it, while training the story or conversing with the bot through DMM, in the conversation we dont always mention the location and has to be captured separately.

Now having said this, I went through this example of Weatherbot: https://github.com/JustinaPetr/Weatherbot_Tutorial.git where there is also a video tutorial...! In this video, the scenarios is carried out to find the weather in a specific place. The bot had accepted the location without any preposition identifier....

This caused the confusion if there is anything in the Pipeline I need to alter....

Pls let me know...

twhughes commented 6 years ago

Just to be clear, did you add a few examples of one-word inputs with city as an entity? Like:

{
    "text": "Canada",
    "intent": "inform",
    "entities": [
      {
        "start": 0,
        "end": 6,
        "value": "Canada",
        "entity": "location"
      }
    ]
}

for example? @JustinaPetr said that she included some like this when she did the demo. Since this is a demo one can expect less ideal performance. In general usage you should supply your own training data to cover what use cases you expect and the more examples the merrier.

to answer your question, the pipeline looks fine to me

JustinaPetr commented 6 years ago

Hey @Sramperu.

Regarding the entities, I agree with @twhughes - it's actually quite complicated to train the model so that it would know that 'Sydney' alone is a city name. The only thing that might be different about the weatherbot tutorial and the data you have is that in weatherbot tutorial dataset I actually have a few examples of those one-word city inputs and that's why sometimes (but unfortunately not all the time) it extracts the city without a preposition. So just like @twhughes suggested, try adding some examples like this as well.

vivekanon commented 5 years ago

Hi @twhughes , any update on this please?

twhughes commented 5 years ago

@vivekanon No update, sorry.