RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.92k stars 4.63k forks source link

Memory leak for CRFEntityExtractor #6684

Closed grungert closed 3 years ago

grungert commented 4 years ago

Rasa version: 1.10.7 Python version: 3.7 Operating system: Ubuntu 18

Issue: I am using the configuration given below. In the first case, I tested it with a 7MB large training dataset (on my thinking is small set). While training BERT(featurizers, and DIETClassifier), the usage of the server's memory is fine:

After that, when it comes to CRFEntityExtractor, while usage of GPU memory stays the same, the usage of CPU memory jumps to 100%, and the server crashes - it gives the error message below. In the second case, I used a 1MB large training dataset. When it comes to CRFEntityExtractor, the usage of CPU memory instantaneously jumps from 8% to 20%, but then it drops to 10% and training continues normally.

Moreover, this situation happens only with the combination of BERT and CRFEntityExtractor in the pipeline.

Is there somebody familiar with this issue? Any help would be appreciated.

Server configuration:

Error (including full traceback):

Network error: Connection timed out

Command or request that led to error:

rasa train nlu

Content of configuration file (config.yml):

language: en
pipeline:
- name: "HFTransformersNLP"
  model_weights: "bert-base-cased-finetuned-mrpc"
  model_name: "bert"
- name: "LanguageModelTokenizer"
- name: "LanguageModelFeaturizer"
- name: "CountVectorsFeaturizer"
  analyzer: 'word'
- name: "CountVectorsFeaturizer"
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: "DIETClassifier"
  random_seed: 42
  intent_classification: True
  entity_recognition: False
  epochs: 10
  learning_rate: 0.0002
  embedding_dimension: 60
  number_of_transformer_layers: 1
  batch_size: 64
  hidden_layer_sizes:
  text: [256, 128]
- name: "CRFEntityExtractor"
  "BILOU_flag": True
  "features": 
   [
    [
        "prefix5",
        "prefix2",
        "suffix5",
        "suffix3",
        "suffix2",
        "digit"
    ],
    [
        "prefix5",
        "prefix2",
        "suffix5",
        "suffix3",
        "suffix2",
        "digit"
    ],
    [
        "bias",
        "prefix5",
        "prefix2",
        "suffix5",
        "suffix3",
        "suffix2",
        "digit"
    ],
    [
        "bias",
        "low",
        "prefix5",
        "prefix2",
        "suffix5",
        "suffix3",
        "suffix2",
        "title",
        "digit",
        "pattern"
    ],
    [
        "bias",
        "prefix5",
        "prefix2",
        "suffix5",
        "suffix3",
        "suffix2",
        "digit"
    ],
    [
        "prefix5",
        "prefix2",
        "suffix5",
        "suffix3",
        "suffix2",
        "digit"
    ],
    [
        "prefix5",
        "prefix2",
        "suffix5",
        "suffix3",
        "suffix2",
        "digit"
    ]
   ]
sara-tagger commented 4 years ago

Thanks for the issue, @amn41 will get back to you about it soon!

You may find help in the docs and the forum, too 🤗
tabergma commented 4 years ago

Can you please update to the latest Rasa version, e.g. 1.10.14, it should be fixed there.

alwx commented 3 years ago

Closing it because it is already be fixed. Feel free to reopen if you see it happening again.