RasaHQ / rasa-nlu-examples

This repository contains examples of custom components for educational purposes.
https://RasaHQ.github.io/rasa-nlu-examples/
Apache License 2.0
190 stars 77 forks source link

SparseBytePairFeaturizer for hindi language is still asking for en model #139

Closed sids07 closed 3 years ago

sids07 commented 3 years ago

I was trying to apply sparsebytefeaturizer for Hindi language and given the only cache_dir then model for Hindi language is downloaded but after download, it searches for English language model on the cache_dir which obviously is not present there so, it throws no file found error.

my config.yml:

language: hi

pipeline:
  - name: WhitespaceTokenizer
  - name: rasa_nlu_examples.featurizers.dense.GensimFeaturizer
    cache_dir: /home/sid/Desktop/treeleaf/chatbot/embed
    file: hi_gen.kv
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: rasa_nlu_examples.featurizers.sparse.SparseBytePairFeaturizer
    lang: hi
    vs: 1000
    cache_dir: /home/sid/Desktop/treeleaf/chatbot/cache_dir
    model_file: /home/sid/Desktop/treeleaf/chatbot/cache_dir/hi/hi.wiki.bpe.vs1000.model
  - name: DIETClassifier
    random_seed: 42
    intent_classification: True
    entity_recognition: False
    use_masked_language_model: False
    epochs: 300
policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: RulePolicy

my file directories have cache_dir folder and it subfolders as: .. ... cache_dir: -- hi: ---- hi.wiki.bpe.vs1000.model ---- hi.wiki.bpe.vs1000.d25.w2v.bin ... ..

Error Message:

Traceback (most recent call last): File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/bin/rasa", line 8, in sys.exit(main()) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/main.py", line 116, in main cmdline_arguments.func(cmdline_arguments) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/cli/train.py", line 58, in train_parser.set_defaults(func=lambda args: train(args, can_exit=True)) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/cli/train.py", line 102, in train finetuning_epoch_fraction=args.epoch_fraction, File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 109, in train loop, File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/utils/common.py", line 308, in run_in_loop result = loop.run_until_complete(f) File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 174, in train_async finetuning_epoch_fraction=finetuning_epoch_fraction, File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 305, in _train_async_internal finetuning_epoch_fraction=finetuning_epoch_fraction, File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/train.py", line 818, in _train_nlu_with_validated_data **additional_arguments, File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/train.py", line 98, in train nlu_config, component_builder, model_to_finetune=model_to_finetune File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/model.py", line 163, in init self.pipeline = self._build_pipeline(cfg, component_builder) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/model.py", line 174, in _build_pipeline component = component_builder.create_component(component_cfg, cfg) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/components.py", line 852, in create_component component = registry.create_component_by_config(component_config, cfg) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/registry.py", line 193, in create_component_by_config return component_class.create(component_config, config) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa/nlu/components.py", line 525, in create return cls(component_config) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/rasa_nlu_examples/featurizers/sparse/sparse_bpemb_featurizer.py", line 384, in init self.spm = spm.SentencePieceProcessor(model_file=str(model_fp)) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/sentencepiece/init.py", line 218, in Init self.Load(model_file=model_file, model_proto=model_proto) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/sentencepiece/init.py", line 367, in Load return self.LoadFromFile(model_file) File "/home/sid/Desktop/treeleaf/chatbot/bot_ras/lib/python3.6/site-packages/sentencepiece/init.py", line 171, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) OSError: Not found: "/home/sid/Desktop/treeleaf/chatbot/cache_dir/hi/en.wiki.bpe.vs1000.model": No such file or directory Error #2

sara-tagger commented 3 years ago

Thanks for the issue, @m-vdb will get back to you about it soon!

You may find help in the docs and the forum, too 🤗
koaning commented 3 years ago

Just to confirm, could you try;

- name: rasa_nlu_examples.featurizers.SparseBytePairFeaturizer
  lang: hi
  vs: 1000

The cached use-case is more for folks who want to pre-build docker containers. If you don't pass a folder it should automatically fetch the file if it doesn't exist.

koaning commented 3 years ago

Also! I don't know what dataset you're running this on, but if you have a representative dataset I'd love to hear if these tools increase the performance of your assistant.

sids07 commented 3 years ago

@koaning i have tried the same which you are referring at my first try which automatically downloaded files for hindi languages under hi directory within cache_dir but still it asked for english language file.

koaning commented 3 years ago

Gotya. I think I've indeed found the bug here https://github.com/RasaHQ/rasa-nlu-examples/blob/main/rasa_nlu_examples/featurizers/sparse/sparse_bpemb_featurizer.py#L367.

koaning commented 3 years ago

Made a PR here: https://github.com/RasaHQ/rasa-nlu-examples/pull/140.

koaning commented 3 years ago

The PR should contain the fix, if it's still broken, feel free to re-open the issue!

sids07 commented 3 years ago

it is still not working @koaning as per this PR made on #140 we still have to change it on line 379:

In current update:

model_fp = (
            Path(cache_dir)
            / self.component_config["lang"]
            / f"en.wiki.bpe.vs{self.component_config['vs']}.model"
        )

new changes to be made for working with other languages not english

model_fp = (
            Path(cache_dir)
            / self.component_config["lang"]
            / f"{self.component_config['lang']}.wiki.bpe.vs{self.component_config['vs']}.model"
        )