RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.81k stars 4.62k forks source link

How to scale correctly #1249

Closed atotalnoob closed 6 years ago

atotalnoob commented 6 years ago

Rasa NLU version: 0.12.2

Operating system (windows, osx, ...): Windows Server 2012 R2

Content of model configuration file: Server YML: language: "en"

pipeline:

name: "nlp_spacy" name: "tokenizer_spacy" name: "intent_featurizer_spacy" name: "ner_crf" features: [["low", "title"], ["bias", "word3"], ["upper", "pos", "pos2"]] name: "ner_synonyms" name: "intent_classifier_sklearn" name: "intent_entity_featurizer_regex" name: "ner_duckling" dimensions: [ "time", "number", "duration"] Training YML: language: "en"

pipeline:

name: "nlp_spacy" name: "tokenizer_spacy" name: "intent_featurizer_spacy" name: "ner_crf" features: [["low", "title"], ["bias", "word3"], ["upper", "pos", "pos2"]] name: "ner_synonyms" name: "intent_classifier_sklearn" name: "intent_entity_featurizer_regex" name: "ner_duckling" dimensions: [ "time", "number", "duration"] data: {Data Goes Here}

Issue: I've been working on a chatbot platform for my company (internal use) and we are using Rasa NLU. We have gained some traction internally, and one of our next use cases to tackle is a Frequently Asked Question bot.

The issue is they have over 1000 questions and answer pairs. How do I scale Rasa up correctly to handle such a large, diverse, dataset.

akelad commented 6 years ago

Scaling up in terms of what? You'd definitely have to go through those question/answers pairs and label them accordingly. I'd guess a lot of them overlap and so you'd end up with fewer than 1000 intents.