buriy / spacy-ru

Russian language models for spaCy
MIT License
242 stars 29 forks source link

Spacy-RU integration with Rasa Open source #30

Open EugenSmith opened 4 years ago

EugenSmith commented 4 years ago

Приветствую.

Описание установки и используемые версии пакетов. apt update && apt install -y python3-venv python3-dev python3-pip

python3 -m venv ./venv source ./venv/bin/activate

pip install -U pip pip install rasa --use-feature=2020-resolver

pip install pymorphy2==0.8 pip install spacy==2.1.9

git clone -b v2.1 https://github.com/buriy/spacy-ru.git cp -r ./spacy-ru/ru2/. ./ru2/

python -V Python 3.6.9

pip -V pip 20.2.2 from /home/rasa/venv/lib/python3.6/site-packages/pip (python 3.6)

pip show tensorflow Version: 2.1.1

pip show tensorflow_addons Version: 0.7.1

pip show pymorphy2 Version: 0.8

pip show spacy Version: 2.1.9

rasa --version Rasa 1.10.12

cat ./config.yml

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: ru
pipeline:
  - name: "SpacyNLP"
    model: ru2
  - name: "SpacyTokenizer"
  - name: "SpacyFeaturizer"
  - name: "RegexFeaturizer"
  - name: "CRFEntityExtractor"
  - name: "EntitySynonymMapper"
  - name: "SklearnIntentClassifier"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: ResponseSelector
    epochs: 100
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100

После запуска комманды: rasa train

Training Core model...
Processed Story Blocks: 100%|███████████████| 5/5 [00:00<00:00, 3274.24it/s, # trackers=1]
Processed Story Blocks: 100%|███████████████| 5/5 [00:00<00:00, 1573.97it/s, # trackers=5]
Processed Story Blocks: 100%|███████████████| 5/5 [00:00<00:00, 405.48it/s, # trackers=20]
Processed Story Blocks: 100%|███████████████| 5/5 [00:00<00:00, 301.93it/s, # trackers=24]
Processed trackers: 100%|███████████████████| 5/5 [00:00<00:00, 1970.45it/s, # actions=16]
Processed actions: 16it [00:00, 10648.82it/s, # examples=16]
Processed trackers: 100%|███████████████| 231/231 [00:00<00:00, 822.90it/s, # actions=126]
Epochs: 100%|██████| 100/100 [00:26<00:00,  3.71it/s, t_loss=0.084, loss=0.011, acc=1.000]
2020-09-06 16:12:45 INFO     rasa.utils.tensorflow.models  - Finished training.
2020-09-06 16:12:45 INFO     rasa.core.agent  - Persisted model to '/tmp/tmpwnqa2h6f/core'
Core model training completed.
Training NLU model...
2020-09-06 16:12:45 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'ru2'
2020-09-06 16:12:45 INFO     pymorphy2.opencorpora_dict.wrapper  - Loading dictionaries from /home/rasa/venv/lib/python3.6/site-packages/pymorphy2_dicts/data
2020-09-06 16:12:45 INFO     pymorphy2.opencorpora_dict.wrapper  - format: 2.4, revision: 393442, updated: 2015-01-17T16:03:56.586168
2020-09-06 16:12:51 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-ru2'.
2020-09-06 16:12:51 INFO     rasa.nlu.training_data.training_data  - Training data stats:
2020-09-06 16:12:51 INFO     rasa.nlu.training_data.training_data  - Number of intent examples: 33 (7 distinct intents)
2020-09-06 16:12:51 INFO     rasa.nlu.training_data.training_data  -   Found intents: 'mood_unhappy', 'bot_challenge', 'deny', 'mood_great', 'goodbye', 'greet', 'affirm'
2020-09-06 16:12:51 INFO     rasa.nlu.training_data.training_data  - Number of response examples: 0 (0 distinct responses)
2020-09-06 16:12:51 INFO     rasa.nlu.training_data.training_data  - Number of entity examples: 0 (0 distinct entities)
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Finished training component.
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Finished training component.
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Starting to train component SpacyFeaturizer
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Finished training component.
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Finished training component.
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Starting to train component CRFEntityExtractor
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Finished training component.
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Finished training component.
2020-09-06 16:12:51 INFO     rasa.nlu.model  - Starting to train component SklearnIntentClassifier
Fitting 2 folds for each of 6 candidates, totalling 12 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:    0.0s finished
Traceback (most recent call last):
  File "/home/rasa/venv/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/home/rasa/venv/lib/python3.6/site-packages/rasa/__main__.py", line 92, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/home/rasa/venv/lib/python3.6/site-packages/rasa/cli/train.py", line 76, in train
    additional_arguments=extract_additional_arguments(args),
  File "/home/rasa/venv/lib/python3.6/site-packages/rasa/train.py", line 50, in train
    additional_arguments=additional_arguments,
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/home/rasa/venv/lib/python3.6/site-packages/rasa/train.py", line 101, in train_async
    additional_arguments,
  File "/home/rasa/venv/lib/python3.6/site-packages/rasa/train.py", line 188, in _train_async_internal
    additional_arguments=additional_arguments,
  File "/home/rasa/venv/lib/python3.6/site-packages/rasa/train.py", line 245, in _do_training
    persist_nlu_training_data=persist_nlu_training_data,
  File "/home/rasa/venv/lib/python3.6/site-packages/rasa/train.py", line 482, in _train_nlu_with_validated_data
    persist_nlu_training_data=persist_nlu_training_data,
  File "/home/rasa/venv/lib/python3.6/site-packages/rasa/nlu/train.py", line 90, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "/home/rasa/venv/lib/python3.6/site-packages/rasa/nlu/model.py", line 191, in train
    updates = component.train(working_data, self.config, **context)
  File "/home/rasa/venv/lib/python3.6/site-packages/rasa/nlu/classifiers/sklearn_intent_classifier.py", line 125, in train
    self.clf.fit(X, y)
  File "/home/rasa/venv/lib/python3.6/site-packages/sklearn/model_selection/_search.py", line 739, in fit
    self.best_estimator_.fit(X, y, **fit_params)
  File "/home/rasa/venv/lib/python3.6/site-packages/sklearn/svm/_base.py", line 148, in fit
    accept_large_sparse=False)
  File "/home/rasa/venv/lib/python3.6/site-packages/sklearn/utils/validation.py", line 755, in check_X_y
    estimator=estimator)
  File "/home/rasa/venv/lib/python3.6/site-packages/sklearn/utils/validation.py", line 578, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/home/rasa/venv/lib/python3.6/site-packages/sklearn/utils/validation.py", line 60, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Если установить языковую модель en запускается без ошибок. Прошу поделиться опытом тех у кого получилось использовать RASA и русский язык.