RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.94k stars 4.64k forks source link

Does `suffix..` or any other features depend on capitalization in `ner_crf`? #1351

Closed Ghostvv closed 5 years ago

Ghostvv commented 6 years ago

The following config for ner_crf:

- name: ner_crf
  features:
  - - low
    - digit
  - - bias
    - suffix3
    - suffix2
    - low
    - digit
    - pattern
  - - low
    - digit

picks different entities in B-OY 2018 and b-oy 2018

amn41 commented 6 years ago

@Ghostvv can you please provide steps to reproduce & the exact output?

Ghostvv commented 6 years ago

For demo-rasa.md dataset, with above config for ner_crf, I get:

{'intent': {'name': 'restaurant_search', 'confidence': 0.9175125956535339}, 'entities': [{'start': 8, 'end': 15, 'value': 'CHINESE', 'entity': 'cuisine', 'confidence': 0.6072946511019204, 'extractor': 'ner_crf'}], 'intent_ranking': [{'name': 'restaurant_search', 'confidence': 0.9175125956535339}, {'name': 'goodbye', 'confidence': 0.06257633119821548}, {'name': 'address', 'confidence': 0.0}, {'name': 'greet', 'confidence': 0.0}, {'name': 'affirm', 'confidence': 0.0}], 'text': 'show me CHINESE restaurants'}

{'intent': {'name': 'restaurant_search', 'confidence': 0.9175125956535339}, 'entities': [{'start': 8, 'end': 15, 'value': 'chinese', 'entity': 'cuisine', 'confidence': 0.8070571545315702, 'extractor': 'ner_crf'}], 'intent_ranking': [{'name': 'restaurant_search', 'confidence': 0.9175125956535339}, {'name': 'goodbye', 'confidence': 0.06257633119821548}, {'name': 'address', 'confidence': 0.0}, {'name': 'greet', 'confidence': 0.0}, {'name': 'affirm', 'confidence': 0.0}], 'text': 'show me chinese restaurants'}
stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Ghostvv commented 5 years ago

@amn41 did you try to reproduce it?

amn41 commented 5 years ago

no - please add to backlog

Ghostvv commented 5 years ago

Took a look at this, suffix and prefix are parts of the tokens, so if tokenization is case sensitive, it depends on capitalization.