BeWe11 / rasa_composite_entities

A Rasa NLU component for composite entities.
MIT License
29 stars 10 forks source link

[BUG] Training doesn't work with rasa X #4

Closed BeWe11 closed 5 years ago

BeWe11 commented 5 years ago

Rasa X doesn't seem to accept the modified training file and raises an error:

Training NLU model… Traceback (most recent call last): File “/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py”, line 193, in _run_module_as_main “main”, mod_spec) File “/usr/local/Cellar/python/3.7.2_2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py”, line 85, in _run_code exec(code, run_globals) File “/Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa/main.py”, line 81, in main() File “/Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa/main.py”, line 70, in main cmdline_arguments.func(cmdline_arguments) File “/Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa/cli/train.py”, line 132, in train_nlu fixed_model_name=args.fixed_model_name, File “/Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa/train.py”, line 372, in train_nlu fixed_model_name=fixed_model_name, File “/Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa/train.py”, line 391, in _train_nlu_with_validated_data config, nlu_data_directory, _train_path, fixed_model_name=“nlu” File “/Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa/nlu/train.py”, line 89, in train training_data = load_data(data, nlu_config.language) File “/Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py”, line 56, in load_data data_sets = [_load(f, language) for f in files] File “/Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py”, line 56, in data_sets = [_load(f, language) for f in files] File “/Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py”, line 115, in _load raise ValueError(“Unknown data format for file {}”.format(filename)) ValueError: Unknown data format for file /var/folders/yd/0wytdp2n6bx4j7nh_km3_xr00000gn/T/tmp2vgbkpkg/e3c2371162544153998fe9dac300943a_nlu.json

The composite entity extractor throws a warning that the train file couldn't be loaded.

2019-06-05 10:17:23 WARNING py.warnings - /Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa_composite_entities/composite_entity_extractor.py:129: UserWarning: Failed to load composite entitiesfile from “/var/folders/yd/0wytdp2n6bx4j7nh_km3_xr00000gn/T/tmph9fkb9ve/nlu/composite_entities.json” ‘file from “{}”’.format(composite_entities_file)

2019-06-05 10:30:09 WARNING py.warnings - /Users/brian/rasax/rasax/lib/python3.7/site-packages/rasa_composite_entities/composite_entity_extractor.py:70: UserWarning: The CompositeEntityExtractor could not load the train file. "The CompositeEntityExtractor could not load "

Both errors are due to rasa's "_guess_format" method not being able to infer the format of the modified train file.

BrianYing commented 5 years ago

This issue only occurs when I do rasa train for training both nlu and core. It works with rasa train nlu for training nlu model specifically. The problem I find is from this part of composite_entity_extractor.py:

@staticmethod
def _get_train_files_cmd():
      """Get the raw train data by fetching the train file given in the
           command line arguments to the train script.
      """
      cmdline_args = create_argument_parser().parse_args()
      files = utils.list_files(cmdline_args.nlu)
      return [file for file in files if _guess_format(file) == RASA_NLU]

When we do rasa train nlu, there will be an "nlu" argument in "cmdline_args" with the correct path to nlu training file. However, under the case of rasa train, "cmdline_args" will not have that "nlu" argument and the "data" argument is the parent path of both "nlu.json" and "stories.md", which is "data/" in my case.

My fix to this is simply hardcode my path of nlu training file to "nlu" if not exist:

if not cmdline_args.__contains__("nlu"):
        cmdline_args.nlu = 'data/nlu.json'

and then extract cmdline_args.nlu

This is not a elegant way to resolving this, but I think this can give you some idea about my issue.

Thanks!

BeWe11 commented 5 years ago

@BrianYing would you mind sharing your ‘nlu.json’ and your ‘stories.md’ files? That would save me some time debugging.

BrianYing commented 5 years ago

@BeWe11 Sent to your email!

BeWe11 commented 5 years ago

Ok, there were two issues:

  1. The training data was not correctly fetched when training a full rasa model (via rasa train) instead of just a NLU model (via rasa train nlu). This has been fixed by c86250080e5f332d94ba93c983eaef26b4a2a4b1
  2. Your training file seems to be broken in some way I don't understand. If I try training with the file you've sent me, I get the same error with "Unknown data format". If I copy the whole content of that file and paste it into a new file, I can train with that new file just fine.

@BrianYing can you please confirm this on your machine? First, run pip install -U rasa-composite-entities to get the fix for point 1 (version 0.4.2), then try creating a new training file with identical content and see if you can train successfully.

By the way, I don't think it's possible right now to use this component with rasa X. It seems rasa X only directly reads markdown files. When I import my JSON file, the composite patterns get stripped and a markdown file is being saved. So right now, you have to use the training scripts from the command line I guess :/

BrianYing commented 5 years ago

@BeWe11 It works! Yeah rasa x UI does not support reading json file, but using command line works fine. Thank you so much!

BeWe11 commented 5 years ago

Glad it’s working now! I’m gonna close this issue and add another one for the rasa X thing.