RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.94k stars 4.64k forks source link

Can't running model on custom dataset #3164

Closed alvipranandha closed 5 years ago

alvipranandha commented 5 years ago
**Rasa NLU version**: 0.14.0 **Python version**: 3.6.3 : : Anaconda, Inc **Operating system** (windows, osx, ...): Windows 10 Enterprise Build 1809 **Issue**: Hello, I'm trying to develop entity extraction using Rasa Library and using custom dataset. And currently I'm trying to run the model but it got an error. Could you help me? Traceback (most recent call last): File "nlu_model.py", line 30, in train('./data/training_data.json', './config/config.yml', './models/nlu') File "nlu_model.py", line 15, in train training_data = load_data(data) File "C:\Users\Alvi\Anaconda3\lib\site-packages\rasa_nlu\training_data\loading.py", line 55, in load_data data_sets = [_load(f, language) for f in files] File "C:\Users\Alvi\Anaconda3\lib\site-packages\rasa_nlu\training_data\loading.py", line 55, in data_sets = [_load(f, language) for f in files] File "C:\Users\Alvi\Anaconda3\lib\site-packages\rasa_nlu\training_data\loading.py", line 115, in _load return reader.read(filename, language=language, fformat=fformat) File "C:\Users\Alvi\Anaconda3\lib\site-packages\rasa_nlu\training_data\formats\readerwriter.py", line 13, in read return self.reads(utils.read_file(filename), **kwargs) File "C:\Users\Alvi\Anaconda3\lib\site-packages\rasa_nlu\training_data\formats\readerwriter.py", line 35, in reads return self.read_from_json(js, **kwargs) File "C:\Users\Alvi\Anaconda3\lib\site-packages\rasa_nlu\training_data\formats\rasa.py", line 22, in read_from_json validate_rasa_nlu_data(js) File "C:\Users\Alvi\Anaconda3\lib\site-packages\rasa_nlu\training_data\formats\rasa.py", line 91, in validate_rasa_nlu_data raise e File "C:\Users\Alvi\Anaconda3\lib\site-packages\rasa_nlu\training_data\formats\rasa.py", line 86, in validate_rasa_nlu_data validate(data, _rasa_nlu_data_schema()) File "C:\Users\Alvi\Anaconda3\lib\site-packages\jsonschema\validators.py", line 541, in validate cls(schema, *args, **kwargs).validate(instance) File "C:\Users\Alvi\Anaconda3\lib\site-packages\jsonschema\validators.py", line 130, in validate raise error jsonschema.exceptions.ValidationError: '' is too short. Failed to validate training data, make sure your data is valid. For more information about the format visit https://github.com/RasaHQ/rasa_nlu/blob/master/docs/dataformat.rst Failed validating 'minLength' in schema['properties']['rasa_nlu_data']['properties']['common_examples']['items']['properties']['text']: {'minLength': 1, 'type': 'string'} On instance['rasa_nlu_data']['common_examples'][1427]['text']: **Content of configuration file (config.yml)**: ```yml language: "en" pipeline: - name: "nlp_spacy" model: "en" - name: "tokenizer_spacy" - name: "ner_crf" - name: "intent_featurizer_count_vectors" - name: "intent_classifier_sklearn" ``` **Content of domain file (domain.yml)** (if used & relevant): ```yaml I'm not using domain.yml because i'm just focused on entity extraction ```
akelad commented 5 years ago

looks like you may have some empty training examples. On instance['rasa_nlu_data']['common_examples'][1427]['text'] -- that should help you find it

alvipranandha commented 5 years ago

Thank you @akelad for your information, now I'm trying to cleaning some data that had null values and just opened rasa.py in a rasa_nlu library and found this:

def _rasa_nlu_data_schema(): training_example_schema = { "type": "object", "properties": { "text": {"type": "string", "minLength": 1}, "intent": {"type": "string"}, "entities": { "type": "array", "items": { "type": "object", "properties": { "start": {"type": "number"}, "end": {"type": "number"}, "value": {"type": "string"}, "entity": {"type": "string"} }, "required": ["start", "end", "entity"] } } }, "required": ["text"] }

and from one of the functions above, I knew that minimum length is one. So that's why I can't running a model.