RasaHQ / rasa-nlu-examples

This repository contains examples of custom components for educational purposes.
https://RasaHQ.github.io/rasa-nlu-examples/
Apache License 2.0
189 stars 77 forks source link

Error initializing graph component #163

Open BelenSantamaria opened 2 years ago

BelenSantamaria commented 2 years ago

Hi, I am developing a bot with rasa and I wanted to include the component rasa_nlu_examples.extractors.FlashTextEntityExtractor.

I have added it to my configuration file which is as follows:

recipe: default.v1

language: es

pipeline:
   - name: WhitespaceTokenizer
   - name: RegexFeaturizer
   - name: LexicalSyntacticFeaturizer
   - name: CountVectorsFeaturizer
   - name: CountVectorsFeaturizer
     analyzer: char_wb
     min_ngram: 1
     max_ngram: 4
   - name: rasa_nlu_examples.extractors.FlashTextEntityExtractor
     case_sensitive: True
   - name: DIETClassifier
     epochs: 100
     constrain_similarities: true
   - name: EntitySynonymMapper
   - name: ResponseSelector
     epochs: 100
     constrain_similarities: true
   - name: FallbackClassifier
     threshold: 0.2
     ambiguity_threshold: 0.01

policies:
   - name: MemoizationPolicy
   - name: RulePolicy
   - name: TEDPolicy
     max_history: 5
     epochs: 100
     constrain_similarities: true

I have also added to my nlu file a lookup table with some countries, I add the start below as an example:

version: '3.0'
nlu:
  - lookup: pais
    examples: |
      - Afganistán
      - Åland
      - Albania
      - Alemania
      - Andorra

When using the command rasa train it trains a model and saves it in the models folder but when using rasa shell or rasa interactive it gives me the following error:

ERROR rasa.core.agent - Could not load model due to Error initializing graph component for node 'run_rasa_nlu_examples.extractors.FlashTextEntityExtractor5'..

Versions:

Windows 11
rasa==3.0.4
rasa-nlu-examples @ git+https://github.com/RasaHQ/rasa-nlu-examples@c762a7ebcaef23220b20280db2546415a5b1622e
sara-tagger commented 2 years ago

Thanks for the issue, @mvielkind will get back to you about it soon!

You may find help in the docs and the forum, too 🤗
koaning commented 2 years ago

This feels related to https://github.com/RasaHQ/rasa-3.x-component-examples/pull/5. Will dive in.

koaning commented 2 years ago

This was an issue with documentation, super sorry! This was my bad.

The docs still had the old version listed from Rasa 2.0. The new version no longer uses lookups but uses a path variable instead to point to a file. I just pushed a new version of the docs, could you confirm if the issues persists?

BelenSantamaria commented 2 years ago

Hi, I just tried it and it keeps giving me the same error, I include the configuration file with the changes:

recipe: default.v1

language: es

pipeline:
   - name: WhitespaceTokenizer
   - name: RegexFeaturizer
   - name: LexicalSyntacticFeaturizer
   - name: CountVectorsFeaturizer
   - name: CountVectorsFeaturizer
     analyzer: char_wb
     min_ngram: 1
     max_ngram: 4
   - name: rasa_nlu_examples.extractors.FlashTextEntityExtractor
     case_sensitive: False
     path: data/countries.txt
     entity_name: pais
   - name: DIETClassifier
     epochs: 100
     constrain_similarities: true
   - name: EntitySynonymMapper
   - name: ResponseSelector
     epochs: 100
     constrain_similarities: true
   - name: FallbackClassifier
     threshold: 0.2
     ambiguity_threshold: 0.01

policies:
   - name: MemoizationPolicy
   - name: RulePolicy
   - name: TEDPolicy
     max_history: 5
     epochs: 100
     constrain_similarities: true

I also attach my countries.txt file

koaning commented 2 years ago

Could you give the full traceback including the commands that you ran before the error appeared?

BelenSantamaria commented 2 years ago

This is the rasa train and rasa shell output

rasa_train_1 train_2 rasa_shell

indam23 commented 2 years ago

Could you give the traceback when starting rasa shell nlu --debug? This should provide more details about the error

BelenSantamaria commented 2 years ago

I attach the output as a file because it is very long output.txt

indam23 commented 2 years ago

Thanks, the relevant part is:

'run_rasa_nlu_examples.extractors.FlashTextEntityExtractor5' loading 'FlashTextEntityExtractor.lo
ad' and kwargs: '{}'.
Traceback (most recent call last):
  File "C:\Users\johndoe\miniconda3\envs\company\lib\site-packages\rasa\engine\graph.py", line 393, in _load_component
2022-01-27 16:33:42     self._component: GraphComponent = constructor(  # type: ignore[no-redef]
DEBUG     File "C:\Users\johndoe\miniconda3\envs\company\lib\site-packages\rasa\engine\graph.py", line 220, in load
 urllib3.connectionpool  - Starting new HTTPS connection (1): o251570.ingest.sentry.io:443
    return cls.create(config, model_storage, resource, execution_context)
  File "C:\Users\johndoe\miniconda3\envs\company\lib\site-packages\rasa_nlu_examples\extractors\flashtext_entity_extractor.py", line 85, in create
    return cls(config, execution_context.node_name, model_storage, resource)
  File "C:\Users\johndoe\miniconda3\envs\company\lib\site-packages\rasa_nlu_examples\extractors\flashtext_entity_extractor.py", line 66, in __init__
    words = pathlib.Path(self.path).read_text().split("\n")
  File "C:\Users\johndoe\miniconda3\envs\company\lib\pathlib.py", line 1237, in read_text
    return f.read()
  File "C:\Users\johndoe\miniconda3\envs\company\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in posprojecton 641: character maps to <undefined>
indam23 commented 2 years ago

There seems to be an encoding error in the file you're trying to read in to your custom component. It's assuming cp1252, is that correct?

BelenSantamaria commented 2 years ago

Based on the error you pointed out, I think the problem is in that the country names in the file have special characters and can't be read with this words = pathlib.Path(self.path).read_text().split("\n"), I have executed it and the same error appeared.

If I execute words = pathlib.Path(r'..\data\countries.txt').read_text(encoding='utf-8').split("\n"), it reads the file correctly.

Is it possible to add the encoding as an argument to the extractor?

Thank you! :)

indam23 commented 2 years ago

That makes sense to me! Do you want to open a PR for it?