RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.95k stars 4.64k forks source link

entity_synonyms not recognised #784

Closed jonasblumer closed 6 years ago

jonasblumer commented 6 years ago

Working with the latest version of rasa_nlu, I'm having a problem where synonyms defined by "entity_synonyms" don't return a match. My training data looks as follows:

{
  "rasa_nlu_data": {
    "entity_synonyms": [
      {
        "value": "coffee",
        "synonyms": ["covfefe"]
      }
    ],
    "common_examples": [
      {
        "text": "would like coffee",
        "intent": "order",
        "entities": [
          {
            "start": 11,
            "end": 17,
            "value": "coffee",
            "entity": "item"
          }
        ]
      },
      {
        "text": "could have coffee",
        "intent": "order",
        "entities": [
          {
            "start": 11,
            "end": 17,
            "value": "coffee",
            "entity": "item"
          }
        ]
      },
      {
        "text": "please have coffee",
        "intent": "order",
        "entities": [
          {
            "start": 12,
            "end": 18,
            "value": "coffee",
            "entity": "item"
          }
        ]
      }
}

When I send please have coffee, then an item of the value coffee is identified. But when I enter please have covfefe, I don't get a match, even though covfefe is set to be a synonym.

BUT if I add training data for "covfefe" like so:

{
        "text": "please have coffee",
        "intent": "order",
        "entities": [
          {
            "start": 12,
            "end": 18,
            "value": "coffee",
            "entity": "item"
          }
        ]
}

I DO get a match - with processor ["ner_synonyms"].

So synonyms do seem to be working, but setting them via a entity_synonyms object doesn't work.

amn41 commented 6 years ago

I understand how this is confusing, but it's actually expected behaviour. The synonyms only map to a particular value once they have been recognised as entities. You will still have to add some examples with e.g. covfefe marked as an entity.

If you're up for creating a PR to make the docs clearer on this that would be 💯

jonasblumer commented 6 years ago

Thank you for the quick reply. May I ask, then, what the point is of defining synonyms by entity_synonyms? Is it only to get the processor ["ner_synonyms"] prop in the reply, or are there any other benefits? As far as I can tell, additionally defining entity_synonyms doesn't change the result of the output when I add the synonyms to common_examples array anyway to get a match.

I'll gladly update the docs and contribute as soon as I'm clear on the benefits. Thank you!

wrathagom commented 6 years ago

I am the one that added the note to the docs under the entity synonyms section here.

But I still struggle to explain how this works. In the common_examples section of the training data if you label a section of the text as an entity then that is fed into training an entity recognition model. Only the examples in the common_examples section are fed into the model training. So since you only provided examples with an entity value of coffee the model has not generalized that the item entity can have more values than just coffee. When you add the covfefe example into the common_examples section then it is successfully parsed as an entity by the model.

Once coffee or covfefe are recognized as entity values THEN entity synonyms come into play. In this case they say covfefe is a synonym of coffee so I am going to replace the synonym covfefe with it's defined value coffee.

Said another way expected out put for the request Please have covfefe:

With entity_synonyms:

{
    "entities": [
        {
            "extractor": "ner_crf",
            "end": 19,
            "processors": [
                "ner_synonyms"
            ],
            "value": "coffee",
            "entity": "item",
            "start": 12
        }
    ],
    "intent": null,
    "text": "Please have covfefe",
    "intent_ranking": []
}

Notice how the user asked for covfefe, but the entity value returned was coffee, this is because it was processed by ner_synonyms.

Without entity_synonyms

{
    "entities": [
        {
            "extractor": "ner_crf",
            "end": 19,
            "value": "covfefe",
            "entity": "item",
            "start": 12
        }
    ],
    "intent": null,
    "text": "Please have covfefe",
    "intent_ranking": []
}

Notice with synonyms the actual parsed entity value of covfefe is returned.

amn41 commented 6 years ago

Also @jonasblumer check out https://github.com/RasaHQ/rasa_nlu/issues/773

jonasblumer commented 6 years ago

Thank you for the detailed answers! It does seem to me that the docs could be more specific. So the following two examples will return the same result:

{
  "rasa_nlu_data": {
    "entity_synonyms": [
      {
        "value": "coffee",
        "synonyms": ["covfefe"]
      }
    ],
    "common_examples": [
      {
        "text": "would like covfefe",
        "intent": "order",
        "entities": [
          {
            "start": 11,
            "end": 17,
            "value": "covfefe",
            "entity": "item"
          }
        ]
      }
}

this will return a match with value of coffee because of the entity_synonyms-mapping. notice that in the common examples, the value is covfefe.

AND

{
  "rasa_nlu_data": {
    "common_examples": [
      {
        "text": "would like covfefe",
        "intent": "order",
        "entities": [
          {
            "start": 11,
            "end": 17,
            "value": "coffee",
            "entity": "item"
          }
        ]
      },
 {
        "text": "would like coffee",
        "intent": "order",
        "entities": [
          {
            "start": 11,
            "end": 17,
            "value": "coffee",
            "entity": "item"
          }
        ]
      }
}

will return the same thing, as the value of both entities is coffee. no need for using entity_synonyms here.

In my current understanding, these two examples are absolutely equal.

Is that correct? If yes, I will gladly try to make this more clear in an PR to update the docs.

wrathagom commented 6 years ago

yes, the entity_synonyms just provides a place where more synonyms can be defined in a smaller space. Granted that there still have to be enough examples in the common_examples section to generalize and recognize them.

wrathagom commented 6 years ago

@jonasblumer I am going to close this one, but please do submit a PR. Also, let me know if your issue isn't resolved.

ctrado18 commented 6 years ago

The ultimative power of entity synonyms comes together with the prhase matcher! I just played with phrase matcher and did it before NER in the pipleine such that first untrained entities like item are recognized, afterwards cofeve is replaced to coffee with entity_synonyms! And you don'tneed to train cofeve!

codepie3 commented 4 years ago

Facing issue with pipline while training bots

Rasa version: Rasa 1.6.0

Rasa SDK version (if used & relevant):

Rasa X version (if used & relevant):

Python version:python3.6.9

Operating system (windows, osx, ...):ubuntu 18.04 LTS

Issue: Failed load nlu model while starting rasa shell to test my bot:

nlu and stories are correct and tested with embedded supervised 
![Uploading starter.png…]()

Error (including full traceback):

2020-02-06 21:39:29 INFO     root  - Connecting to channel 'cmdline' which was specified by the '--connector' argument. Any other channels will be ignored. To connect to all given channels, omit the '--connector' argument.
2020-02-06 21:39:29 INFO     root  - Starting Rasa server on http://localhost:5005
2020-02-06 21:39:32 INFO     absl  - Entry Point [tensor2tensor.envs.tic_tac_toe_env:TicTacToeEnv] registered with id [T2TEnv-TicTacToeEnv-v0]
/home/ai/ai/rasa/o/lib/python3.6/site-packages/rasa/nlu/classifiers/embedding_intent_classifier.py:962: UserWarning: Failed to load nlu model. Maybe path '/tmp/tmpwistue_9/nlu' doesn't exist.
  f"Failed to load nlu model. "
2020-02-06 21:39:33 INFO     rasa.nlu.selectors.embedding_response_selector  - Retrieval intent parameter was left to its default value. This response selector will be trainedon training examples combining all retrieval intents.
Bot loaded. Type a message and press enter (use '/stop' to exit): 
Your input ->  tell me location                                                 
2020-02-06 21:39:57 ERROR    rasa.nlu.classifiers.embedding_intent_classifier  - **There is no trained tf.session: component is either not trained or didn't receive enough training data.**
Your input ->  /stop                                                            
2020-02-06 21:41:47 INFO     root  - Killing Sanic server now.

Command or request that led to error:

$ rasa shell 

Content of configuration file (config.yml) (if relevant):


# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
  - name: "WhitespaceTokenizer"
  - name: "RegexFeaturizer"
  - name: "CRFEntityExtractor"
  - name: "EntitySynonymMapper"
  - name: "CountVectorsFeaturizer"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: "EmbeddingIntentClassifier"
  - name: "ResponseSelector"

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: KerasPolicy
  - name: MappingPolicy

Content of domain file (domain.yml) (if relevant):

intents:
  - greet
  - goodbye
  - query_knowledge_base
  - bot_challenge
  - location_ask
  - time_t
  - who_ask

entities:
  - location  
  - address 
  - berlin 
  - date
  - time
  - services

actions:
- utter_iamabot
- utter_greet
- utter_goodbye
- utter_ask_rephrase
- action_location
- action_time

templates:
  utter_greet:
  - text: "Hey!"
  - text: "Hello! How can I help you?"

  utter_goodbye:
  - text: "Bye"
  - text: "Goodbye. See you soon."

  utter_ask_rephrase:
  - text: "Sorry, I'm not sure I understand. Can you rephrase?"
  - text: "Can you please rephrase? I did not got that."

 utter_iamabot:
  - text: "I am a bot, powered by Rasa."