RasaHQ / rasalit

Visualizations and helpers to improve and debug machine learning models for Rasa Open Source
Apache License 2.0
305 stars 62 forks source link

[Spelling/NLU-playground] Support Custom NLU components #40

Closed hotzenklotz closed 3 years ago

hotzenklotz commented 3 years ago

My NLU pipeline contains a custom spellchecking component which is not supported by the rasalit spellingcommand.

I launched RasaLit from the same project directory as I normally run my bot with rasa run. The customizations module/directory is located in the project root and is typically identified by rasa run without any issue. See project structure below.

cd <project_root>

tree <project_root>
bot ❯ tree
.
├── Dockerfile
├── Makefile
├── actions
│   ├── README.md
| ...
├── config
│   ├── config.yml
│   ├── credentials.yml
│   ├── credentials.yml.tpl
│   ├── domain.yml
│   └── endpoints.yml
├── customizations
│   ├── __init__.py
│   ├── botframework_utils.py
│   └── spellchecker.py
├── data
│   ├── _generated_
| ...
│   ├── core
| ...
│   └── nlu
│     ...
├── evaluation
│   ├── failed_test_stories.yml
| ...

├── models
│   └── bot-nlu-core-model.tar.gz

python -m rasalit spelling --folder models --port 8501

My config.yaml for reference:

language: de

pipeline:
  - name: customizations.spellchecker.SpellChecker
  - name: WhitespaceTokenizer
    intent_tokenization_flag: True
  - name: LanguageModelFeaturizer
    model_name: bert
    model_weights: dbmdz/bert-base-german-uncased
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  ...
image
koaning commented 3 years ago

In the future, it would be more meaningful to copy the full traceback. One cannot do text-search on an image from the browser.

That said, I think the issue isn't on my end here. It seems your component is using a library called customizations. Are you 100% sure this module is available in the virtual environment that is running the streamlit app? Rasa will pick up a python file as if it was a module in the config.yml system but my streamlit app is running base python code.

A simple fix might be to add a setup.py file that points to customizations and to install it via pip install -e ..

hotzenklotz commented 3 years ago

Yes, as mentioned customizations is available and rasa run picks it up fine. Running rasa run and rasalit back to back has it working only in the former.

If I try importing it from a Python shell in the same virtual environment from the same project root I have no trouble:

hrbot ❯ python
Python 3.7.9 (default, Aug 31 2020, 07:22:35)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from customizations.spellchecker import SpellChecker
>>> a = SpellChecker
>>> a
<class 'customizations.spellchecker.SpellChecker'>
>>>

Here is the full stack trace:

ComponentNotFoundException: Failed to load the component 'customizations.spellchecker.SpellChecker'. Failed to find module 'customizations.spellchecker'. Either your pipeline configuration contains an error or the module you are trying to import is broken (e.g. the module is trying to import a package that is not installed). Traceback (most recent call last): File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasa/nlu/registry.py", line 121, in get_component_class return rasa.shared.utils.common.class_from_module_path(component_name) File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasa/shared/utils/common.py", line 37, in class_from_module_path m = importlib.import_module(module_name) File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked ModuleNotFoundError: No module named 'customizations'
Traceback:
File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/streamlit/script_runner.py", line 332, in _run_script
    exec(code, module.__dict__)
File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasalit/apps/spelling/app.py", line 50, in <module>
    clf = RasaClassifier(pathlib.Path(model_folder) / model_file)
File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasalit/apps/spelling/classifier.py", line 38, in __init__
    self.interpreter = load_interpreter(folder, file)
File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasalit/apps/spelling/classifier.py", line 15, in load_interpreter
    return RasaNLUInterpreter(nlu_model)
File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasa/core/interpreter.py", line 127, in __init__
    self._load_interpreter()
File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasa/core/interpreter.py", line 164, in _load_interpreter
    self.interpreter = Interpreter.load(self.model_directory)
File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasa/nlu/model.py", line 334, in load
    should_finetune=new_config is not None,
File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasa/nlu/model.py", line 397, in create
    components.validate_requirements(model_metadata.component_classes)
File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasa/nlu/components.py", line 69, in validate_requirements
    component_class = registry.get_component_class(component_name)
File "/usr/local/Caskroom/miniconda/base/envs/hrbot/lib/python3.7/site-packages/rasa/nlu/registry.py", line 149, in get_component_class
    f"Failed to load the component "

I have been able to narrow the issue down to this line: https://github.com/RasaHQ/rasa/blob/main/rasa/shared/utils/common.py#L37 So it calls importlib.import_module("customizations.spellchecker"). This works when running rasa run and when executing directly from the command line:

~/Programming/.../rasa-bot
hrbot ❯ python
Python 3.7.9 (default, Aug 31 2020, 07:22:35)
[Clang 10.0.0 ] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib
>>> importlib.import_module("customizations.spellchecker")
<module 'customizations.spellchecker' from '/Users/.../rasa-bot/customizations/spellchecker.py'>
>>>

I am not sure what it going on. Any chance that streamlit might mess with dynamic imports? It is almost like the base search directory for the module resolution is wrong. I am bit clueless.

koaning commented 3 years ago

Yeah so internally we're running a streamlit app which is linking to a file inside of the package. That package won't have the same filepath the folder where you're running the python -m rasalit command from. I imagine that your import needs relative paths too though.

This probably works

# ~/Programming/.../rasa-bot
> python
>>> import customizations

This probably does not

# ~/Programming/.../rasa-bot/customizations
> python
>>> import customizations

The fix?

A setup.py file should fix everything. That way everything is available from the virtualenv, no matter what folder the python process starting from.

hotzenklotz commented 3 years ago

Yeah so internally we're running a streamlit app which is linking to a file inside of the package. That package won't have the same filepath the folder where you're running the python -m rasalit command from.

I suspect something like this. :-( This will definitely make it hard for custom NLU components.

This probably works ...

Yes, you are right with your assumptions. The first example import works, second does not.

A setup.py file should fix everything. That way everything is available from the virtualenv, no matter what folder the python process starting from.

Not sure how this helps? Are you proposing to install the customizations module system wide? I an a way this is a submodule of current project root. In other words, the import would likely be rasabot.customizations or something. BTW, the project uses poetry for dependency management and general setup. Not sure if that could be helpful.

koaning commented 3 years ago

Not sure how this helps? Are you proposing to install the customizations module system wide?

Are you not running everything inside of a virtualenv? What I am proposing is a setup.py like in this nlu component library. There's also some usage examples, like here.

In the past I've been able to run these nlu-example components inside of rasalit just fine. The reason is that the setup.py can be picked up by pip via pip install -e . which means it directly becomes available in the venv. You could even confirm this via pip freeze.

hotzenklotz commented 3 years ago

Are you not running everything inside of a virtualenv?

Yes, everything is in a virtual env handle by anaconda.

What I am proposing is a setup.py like in this nlu component library. There's also some usage examples, like here.

Our project uses the new pyproject.toml standard + Poetry, so a setup.py is typically not required any longer.

The reason is that the setup.py can be picked up by pip via pip install -e .

pip install -e . will install module globally (within the scope of the virtual env) and move it/symlink it to your Python site-packages. This shouldn't be necessary when using the imports as intended. (as the main Rasa package does) It also a practice, that can not be assumed to be the case for other projects.

Further I still believe this will not help in this case. The setup.py would either need to install customizations as it's own module. I guess my file hierarchy looks like:

hrbot <project root>
|_ customizations
|_ data
|_ actions
|_ ...
| setup.py

So, if I were to locally install the hrbot project the import would be hrbot.customizations, right?

Yeah so internally we're running a streamlit app which is linking to a file inside of the package. That package won't have the same filepath the folder where you're running the python -m rasalit command from.

I think you want to look into workarounds/solutions for this. Otherwise a lot of projects with custom component imports will run it trouble. It would be nice if rasalit just behaved the same way as the main Rasa project with rasa run.

koaning commented 3 years ago

Strange. The hrbot.customizations approach should also work. That doesn't?

koaning commented 3 years ago

I guess one thing I could do is do something like this just before I load in the streamlit app;

import sys
sys.path.append('folder/rasastuff/lives')

I think this change is relatively easy, but I need to think on this a bit just to make sure there's no edge cases.

hotzenklotz commented 3 years ago

Strange. The hrbot.customizations approach should also work. That doesn't?

Yes and no. This will only work 1) if hrbot is installed globally with pip install -e .(which I didn't do) and 2) I would need to change all my imports.

koaning commented 3 years ago

Interesting. This thread is making me rethink some assumptions from the users of this project. I'll come back to this topic next week with a proposed solution.

Especially this rings true;

I think you want to look into workarounds/solutions for this. Otherwise a lot of projects with custom component imports will run it trouble. It would be nice if rasalit just behaved the same way as the main Rasa project with rasa run.

koaning commented 3 years ago

I just ran something locally using this config file.

language: en

pipeline:
   - name: WhitespaceTokenizer
   - name: RegexFeaturizer
   - name: LexicalSyntacticFeaturizer
   - name: CountVectorsFeaturizer
   - name: CountVectorsFeaturizer
     analyzer: char_wb
     min_ngram: 1
     max_ngram: 4
   - name: custom.BytePairFeaturizer
     lang: en
     vs: 1000
     dim: 25
   - name: DIETClassifier
     epochs: 50

I've got a file locally called custom.py that contains the appropriate code and it seems to now work. I checked with both the live-nlu as well as the spelling app and they both now work. As an added bonus, it's now configured that if you run the app from the project folder you no longer need to pass any keyword arguments.

The tests ran green so that means that @hotzenklotz you could try it out again. Let me know!

One note: the CLI settings are a bit different now. You should be able to run just run this from the rasa folder:

python -m rasalit spelling
koaning commented 3 years ago

@hotzenklotz just to check, does it work now on your end?

hotzenklotz commented 3 years ago

@koaning Yes, it seems to work now :-) I see that you added a --project-folder option. The default value (my current directory, which incidentally is my rasa bot project root) worked straight away. The spelling module now works for me. Thank you very much. Please feel free to close this issue.