RasaHQ / rasa

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
https://rasa.com/docs/rasa/
Apache License 2.0
18.82k stars 4.62k forks source link

failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED #11127

Closed pluveto closed 1 year ago

pluveto commented 2 years ago

Rasa Open Source version

3.1.0

Rasa SDK version

3.1.1

Rasa X version

No response

Python version

3.9

What operating system are you using?

Windows

What happened?

Run rasa train

get error.

Command / Request

No response

Relevant log output

022-05-18 14:07:39.437065: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2022-05-18 14:07:39.437600: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2022-05-18 14:07:39.437838: E tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
Traceback (most recent call last):
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\engine\graph.py", line 
464, in __call__
    output = self._fn(self._component, **run_kwargs)
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 920, in train
    self.model.fit(
  File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\utils\tensorflow\temp_keras_modules.py", line 388, in fit
    tmp_logs = self.train_function(iterator)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.  (0) INTERNAL:  Attempting to perform BLAS operation using StreamExecutor without BLAS support
         [[node embed_label/embed_layer_label/MatMul
 (defined at C:\Repo\rasa-demo\venv\lib\site-packages\keras\layers\core\dense.py:199)
]]
         [[Func/crf/cond/StatefulPartitionedCall/crf/cond/else/_236/input/_538/_362]]
  (1) INTERNAL:  Attempting to perform BLAS operation using StreamExecutor without BLAS support
         [[node embed_label/embed_layer_label/MatMul
 (defined at C:\Repo\rasa-demo\venv\lib\site-packages\keras\layers\core\dense.py:199)
]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_48041]

Errors may have originated from an input operation.
Input Source operations connected to node embed_label/embed_layer_label/MatMul:
In[0] Sum_1 (defined at C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py:1493)
In[1] embed_label/embed_layer_label/MatMul/ReadVariableOp:

Operation defined at: (most recent call last)
>>>   File "C:\Users\i\AppData\Local\Programs\Python\Python39\lib\runpy.py", 
line 197, in _run_module_as_main
>>>     return _run_code(code, main_globals, None,
>>>
>>>   File "C:\Users\i\AppData\Local\Programs\Python\Python39\lib\runpy.py", 
line 87, in _run_code
>>>     exec(code, run_globals)
>>>
>>>   File "C:\Repo\rasa-demo\venv\Scripts\rasa.exe\__main__.py", line 7, in 
<module>
>>>     sys.exit(main())
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\__main__.py", line 
119, in main
>>>     cmdline_arguments.func(cmdline_arguments)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\cli\train.py", line 59, in <lambda>
>>>     train_parser.set_defaults(func=lambda args: run_training(args, can_exit=True))
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\cli\train.py", line 91, in run_training
>>>     training_result = train_all(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\api.py", line 105, 
in train
>>>     return train(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\model_training.py", line 160, in train
>>>     return _train_graph(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\model_training.py", line 234, in _train_graph
>>>     trainer.train(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\engine\training\graph_trainer.py", line 105, in train
>>>     graph_runner.run(inputs={PLACEHOLDER_IMPORTER: importer})
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\engine\runner\dask.py", line 101, in run
>>>     dask_result = dask.get(run_graph, run_targets)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 553, in get_sync
>>>     return get_async(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 495, in get_async
>>>     fire_tasks(chunksize)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 490, in fire_tasks
>>>     fut = submit(batch_execute_tasks, each_args)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 538, in submit
>>>     fut.set_result(fn(*args, **kwargs))
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 234, in batch_execute_tasks
>>>     return [execute_task(*a) for a in it]
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 234, in <listcomp>
>>>     return [execute_task(*a) for a in it]
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 220, in execute_task
>>>     result = _execute_task(task, data)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\core.py", line 119, in _execute_task
>>>     return func(*(_execute_task(a, cache) for a in args))
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\engine\graph.py", line 464, in __call__
>>>     output = self._fn(self._component, **run_kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 920, in train
>>>     self.model.fit(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\utils\tensorflow\temp_keras_modules.py", line 388, in fit
>>>     tmp_logs = self.train_function(iterator)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\engine\training.py", line 878, in train_function
>>>     return step_function(self, iterator)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\engine\training.py", line 867, in step_function
>>>     outputs = model.distribute_strategy.run(run_step, args=(data,))      
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\engine\training.py", line 860, in run_step
>>>     outputs = model.train_step(data)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\utils\tensorflow\models.py", line 144, in train_step
>>>     prediction_loss = self.batch_loss(batch_in)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 1622, in batch_loss
>>>     loss = self._batch_loss_intent(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 1661, in _batch_loss_intent
>>>     loss, acc = self._calculate_label_loss(sentence_vector, label, label_ids)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 1559, in _calculate_label_loss
>>>     all_label_ids, all_labels_embed = self._create_all_labels()
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 1510, in _create_all_labels
>>>     all_labels_embed = self._tf_layers[f"embed.{LABEL}"](x)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\utils\tensorflow\layers.py", line 464, in call
>>>     x = self._dense(x)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\layers\core\dense.py", line 199, in call
>>>     outputs = tf.matmul(a=inputs, b=self.kernel)
>>>

Input Source operations connected to node embed_label/embed_layer_label/MatMul:
In[0] Sum_1 (defined at C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py:1493)
In[1] embed_label/embed_layer_label/MatMul/ReadVariableOp:

Operation defined at: (most recent call last)
>>>   File "C:\Users\i\AppData\Local\Programs\Python\Python39\lib\runpy.py", 
line 197, in _run_module_as_main
>>>     return _run_code(code, main_globals, None,
>>>
>>>   File "C:\Users\i\AppData\Local\Programs\Python\Python39\lib\runpy.py", 
line 87, in _run_code
>>>     exec(code, run_globals)
>>>
>>>   File "C:\Repo\rasa-demo\venv\Scripts\rasa.exe\__main__.py", line 7, in 
<module>
>>>     sys.exit(main())
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\__main__.py", line 
119, in main
>>>     cmdline_arguments.func(cmdline_arguments)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\cli\train.py", line 59, in <lambda>
>>>     train_parser.set_defaults(func=lambda args: run_training(args, can_exit=True))
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\cli\train.py", line 91, in run_training
>>>     training_result = train_all(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\api.py", line 105, 
in train
>>>     return train(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\model_training.py", line 160, in train
>>>     return _train_graph(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\model_training.py", line 234, in _train_graph
>>>     trainer.train(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\engine\training\graph_trainer.py", line 105, in train
>>>     graph_runner.run(inputs={PLACEHOLDER_IMPORTER: importer})
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\engine\runner\dask.py", line 101, in run
>>>     dask_result = dask.get(run_graph, run_targets)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 553, in get_sync
>>>     return get_async(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 495, in get_async
>>>     fire_tasks(chunksize)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 490, in fire_tasks
>>>     fut = submit(batch_execute_tasks, each_args)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 538, in submit
>>>     fut.set_result(fn(*args, **kwargs))
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 234, in batch_execute_tasks
>>>     return [execute_task(*a) for a in it]
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 234, in <listcomp>
>>>     return [execute_task(*a) for a in it]
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 220, in execute_task
>>>     result = _execute_task(task, data)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\core.py", line 119, in _execute_task
>>>     return func(*(_execute_task(a, cache) for a in args))
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\engine\graph.py", line 464, in __call__
>>>     output = self._fn(self._component, **run_kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 920, in train
>>>     self.model.fit(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\utils\tensorflow\temp_keras_modules.py", line 388, in fit
>>>     tmp_logs = self.train_function(iterator)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\engine\training.py", line 878, in train_function
>>>     return step_function(self, iterator)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\engine\training.py", line 867, in step_function
>>>     outputs = model.distribute_strategy.run(run_step, args=(data,))      
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\engine\training.py", line 860, in run_step
>>>     outputs = model.train_step(data)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\utils\tensorflow\models.py", line 144, in train_step
>>>     prediction_loss = self.batch_loss(batch_in)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 1622, in batch_loss
>>>     loss = self._batch_loss_intent(
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 1661, in _batch_loss_intent
>>>     loss, acc = self._calculate_label_loss(sentence_vector, label, label_ids)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 1559, in _calculate_label_loss
>>>     all_label_ids, all_labels_embed = self._create_all_labels()
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\nlu\classifiers\diet_classifier.py", line 1510, in _create_all_labels
>>>     all_labels_embed = self._tf_layers[f"embed.{LABEL}"](x)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\utils\tensorflow\layers.py", line 464, in call
>>>     x = self._dense(x)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 64, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\engine\base_layer.py", line 1083, in __call__
>>>     outputs = call_fn(inputs, *args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\utils\traceback_utils.py", line 92, in error_handler
>>>     return fn(*args, **kwargs)
>>>
>>>   File "C:\Repo\rasa-demo\venv\lib\site-packages\keras\layers\core\dense.py", line 199, in call
>>>     outputs = tf.matmul(a=inputs, b=self.kernel)
>>>

Function call stack:
train_function -> train_function

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\i\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\i\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Repo\rasa-demo\venv\Scripts\rasa.exe\__main__.py", line 7, in <module>
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\__main__.py", line 119, in main
    cmdline_arguments.func(cmdline_arguments)
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\cli\train.py", line 59, in <lambda>
    train_parser.set_defaults(func=lambda args: run_training(args, can_exit=True))
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\cli\train.py", line 91, in run_training
    training_result = train_all(
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\api.py", line 105, in train
    return train(
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\model_training.py", line 160, in train
    return _train_graph(
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\model_training.py", line 234, in _train_graph
    trainer.train(
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\engine\training\graph_trainer.py", line 105, in train
    graph_runner.run(inputs={PLACEHOLDER_IMPORTER: importer})
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\engine\runner\dask.py", line 101, in run
    dask_result = dask.get(run_graph, run_targets)
  File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 553, in get_sync
    return get_async(
  File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 496, in get_async
    for key, res_info, failed in queue_get(queue).result():
  File "C:\Users\i\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\_base.py", line 438, in result
    return self.__get_result()
  File "C:\Users\i\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\_base.py", line 390, in __get_result
    raise self._exception
  File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 538, in submit
    fut.set_result(fn(*args, **kwargs))
  File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 234, in batch_execute_tasks
    return [execute_task(*a) for a in it]
  File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 234, in <listcomp>
    return [execute_task(*a) for a in it]
  File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 225, in execute_task
    result = pack_exception(e, dumps)
  File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\local.py", line 220, in execute_task
    result = _execute_task(task, data)
  File "C:\Repo\rasa-demo\venv\lib\site-packages\dask\core.py", line 119, in 
_execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Repo\rasa-demo\venv\lib\site-packages\rasa\engine\graph.py", line 
471, in __call__
    raise GraphComponentException(
rasa.engine.exceptions.GraphComponentException: Error running graph component for node train_DIETClassifier5.
sync-by-unito[bot] commented 1 year ago

➤ Maxime Verger commented:

:bulb: Heads up! We're moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.

From now on, this Jira board is the place where you can browse (without an account) and create issues (you'll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!

:arrow_right: More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.