Hey there,
I am trying to run a simple tensorflow training in a dockercontainer with fractional-gpu. No matter which one I use i always get:
`>>> model.fit(x_train, y_train, epochs=50, batch_size=1000)
Epoch 1/50
2024-06-06 10:53:20.251154: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:185] failed to create cublas handle: the resource allocation failed
2024-06-06 10:53:20.251203: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:188] Failure to initialize cublas may be due to OOM (cublas needs some free memory when you initialize it, and your deep-learning framework may have preallocated more than its fair share), or may be because this binary was not built with support for the GPU in your machine.
2024-06-06 10:53:20.251227: W external/local_xla/xla/stream_executor/stream.cc:1020] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:
Detected at node sequential/dense/MatMul defined at (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1807, in fit
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1401, in train_function
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1384, in step_function
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1373, in run_step
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1150, in train_step
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 590, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer.py", line 1149, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/sequential.py", line 398, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/functional.py", line 515, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/functional.py", line 672, in _run_internal_graph
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer.py", line 1149, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/core/dense.py", line 241, in call
Blas xGEMV launch failed : a.shape=[1,1000,784], b.shape=[1,784,1], m=1000, n=1, k=784
[[{{node sequential/dense/MatMul}}]] [Op:__inference_train_function_932] `
with the official tensorflow/tensorflow:latest-gpu image, everything works as expected.
Hey there, I am trying to run a simple tensorflow training in a dockercontainer with fractional-gpu. No matter which one I use i always get: `>>> model.fit(x_train, y_train, epochs=50, batch_size=1000) Epoch 1/50 2024-06-06 10:53:20.251154: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:185] failed to create cublas handle: the resource allocation failed 2024-06-06 10:53:20.251203: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:188] Failure to initialize cublas may be due to OOM (cublas needs some free memory when you initialize it, and your deep-learning framework may have preallocated more than its fair share), or may be because this binary was not built with support for the GPU in your machine. 2024-06-06 10:53:20.251227: W external/local_xla/xla/stream_executor/stream.cc:1020] attempting to perform BLAS operation using StreamExecutor without BLAS support Traceback (most recent call last): File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py", line 53, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:
Detected at node sequential/dense/MatMul defined at (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1807, in fit
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1401, in train_function
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1384, in step_function
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1373, in run_step
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1150, in train_step
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 590, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer.py", line 1149, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/sequential.py", line 398, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/functional.py", line 515, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/functional.py", line 672, in _run_internal_graph
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 65, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/base_layer.py", line 1149, in call
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 96, in error_handler
File "/usr/local/lib/python3.10/dist-packages/keras/src/layers/core/dense.py", line 241, in call
Blas xGEMV launch failed : a.shape=[1,1000,784], b.shape=[1,784,1], m=1000, n=1, k=784 [[{{node sequential/dense/MatMul}}]] [Op:__inference_train_function_932] ` with the official tensorflow/tensorflow:latest-gpu image, everything works as expected.