RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.88k stars 4.63k forks source link

rasa interactive does not work with entities without start/end #5263

Closed koaning closed 3 years ago

koaning commented 4 years ago

Rasa version:

> rasa --version
Rasa 1.7.2

Python version:

> python --version
Python 3.7.6

Operating system (windows, osx, ...):

osx

Issue:

I have a custom component that detects "bad words". The word shit is one of them.

import os

from rasa.nlu.components import Component
from rasa.nlu import utils
from rasa.nlu.model import Metadata
from typing import Any, Optional, Text, Dict

SENTIMENT_MODEL_FILE_NAME = "sentiment_classifier.pkl"

class BadWordUsage(Component):
    name = "badword"
    provides = ["entities"]
    requires = ["tokens"]
    defaults = {}
    language_list = ["en"]
    print("initialised")

    def __init__(self, component_config=None):
        super(BadWordUsage, self).__init__(component_config)
        self.curses = ["fuck", "shit", "piss", "twat"]

    def train(self, training_data, cfg, **kwargs):
        print("this is the training data")
        print(training_data)
        print("und der cfg")
        print(cfg)
        print("haben sie kwarg?")
        print(kwargs)

    def preprocessing(self, tokens):
        return {word: True for word in tokens}

    def process(self, message, **kwargs):
        """
        Retrieve the tokens of the new message, pass it to the 
        classifierand append prediction results to the message class.
        """

        if not self.curses:
            # component is either not trained or didn't
            # receive enough training data
            entity = None
        else:
            tokens = [t.text for t in message.get("tokens")]
            if any(t in self.curses for t in tokens):
                entity = {
                    "value": "curses", 
                    "confidence": 1,
                    "entity": "badword",
                    "extractor": "bad_word_usage",
                    "text": "whateverdude",
                }

                print(entity)

                message.set("entities", [entity], add_to_output=True)

    def persist(self, file_name, model_dir):
        """Persist this model into the passed directory."""
        classifier_file = os.path.join(model_dir, SENTIMENT_MODEL_FILE_NAME)
        utils.json_pickle(classifier_file, self)
        return {"classifier_file": SENTIMENT_MODEL_FILE_NAME}

    @classmethod
    def load(cls,
             meta: Dict[Text, Any],
             model_dir=None,
             model_metadata=None,
             cached_component=None,
             **kwargs):
        file_name = meta.get("classifier_file")
        classifier_file = os.path.join(model_dir, file_name)
        return utils.json_unpickle(classifier_file)

It is causing errors inside of rasa when called via rasa interactive

> rasa interactive
? Your input -> hello you shit                                                                                                                               
{'value': 'curses', 'confidence': 1, 'entity': 'badword', 'extractor': 'bad_word_usage', 'text': 'whateverdude'}
2020-02-19 10:08:27 ERROR    rasa.core.training.interactive  - An exception occurred while recording messages.
Traceback (most recent call last):
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/core/training/interactive.py", line 1386, in record_messages
    await _validate_nlu(intents, endpoint, sender_id)
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/core/training/interactive.py", line 1156, in _validate_nlu
    valid = await _validate_user_text(latest_message, endpoint, sender_id)
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/core/training/interactive.py", line 1116, in _validate_user_text
    text = _as_md_message(parse_data)
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/core/training/interactive.py", line 1090, in _as_md_message
    return MarkdownWriter.generate_message_md(parse_data)
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/nlu/training_data/formats/markdown.py", line 325, in generate_message_md
    entities = sorted(message.get("entities", []), key=lambda k: k["start"])
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/nlu/training_data/formats/markdown.py", line 325, in <lambda>
    entities = sorted(message.get("entities", []), key=lambda k: k["start"])
KeyError: 'start'
2020-02-19 10:08:27 ERROR    asyncio  - Task exception was never retrieved
future: <Task finished coro=<_serve_application.<locals>.run_interactive_io() done, defined at /Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/core/training/interactive.py:1469> exception=KeyError('start')>
Traceback (most recent call last):
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/core/training/interactive.py", line 1476, in run_interactive_io
    sender_id=uuid.uuid4().hex,
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/core/training/interactive.py", line 1386, in record_messages
    await _validate_nlu(intents, endpoint, sender_id)
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/core/training/interactive.py", line 1156, in _validate_nlu
    valid = await _validate_user_text(latest_message, endpoint, sender_id)
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/core/training/interactive.py", line 1116, in _validate_user_text
    text = _as_md_message(parse_data)
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/core/training/interactive.py", line 1090, in _as_md_message
    return MarkdownWriter.generate_message_md(parse_data)
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/nlu/training_data/formats/markdown.py", line 325, in generate_message_md
    entities = sorted(message.get("entities", []), key=lambda k: k["start"])
  File "/Users/Vincent/Development/pokebot/venv/lib/python3.7/site-packages/rasa/nlu/training_data/formats/markdown.py", line 325, in <lambda>
    entities = sorted(message.get("entities", []), key=lambda k: k["start"])
KeyError: 'start'

This gets fixed if you change the process method of the component.

    def process(self, message, **kwargs):
        """
        Retrieve the tokens of the new message, pass it to the 
        classifierand append prediction results to the message class.
        """

        if not self.curses:
            # component is either not trained or didn't
            # receive enough training data
            entity = None
        else:
            tokens = [t.text for t in message.get("tokens")]
            if any(t in self.curses for t in tokens):
                entity = {
                    "value": "curses", 
                    "confidence": 1,
                    "entity": "badword",
                    "extractor": "bad_word_usage",
                    "text": "whateverdude",
                    "start": 0,
                    "end": 10
                }
wochinge commented 4 years ago

We should only annotate entities int markdown in case they have a start and end key. So basically adding a filter here: https://github.com/RasaHQ/rasa/blob/bad415fa8374230393840fcc5e7132d4ec0e63bb/rasa/nlu/training_data/formats/markdown.py#L325

Best to also check the json dumper

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.

goxiaoy commented 3 years ago

This issue is not fixed, sources: https://github.com/RasaHQ/rasa/blob/b555a5d49bb4d5787fa9e18aa8bc1783cb62a1a2/rasa/shared/nlu/training_data/formats/readerwriter.py#L89

wochinge commented 3 years ago

closing as duplicate of https://github.com/RasaHQ/rasa/issues/8114