NVIDIA / tensorflow

An Open Source Machine Learning Framework for Everyone
https://developer.nvidia.com/deep-learning-frameworks
Apache License 2.0
962 stars 144 forks source link

tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version #83

Open goldwater668 opened 1 year ago

goldwater668 commented 1 year ago

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

2023-03-17 15-20-03 的屏幕截图

NVIDIA GeForce RTX 3060 conda create -n tf15 python=3.8 conda activate tf15 pip install nvidia-pyindex pip install nvidia-tensorflow[horovod]

Python 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf 2023-03-17 15:10:16.549035: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12 WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation. tf.version '1.15.5' tf.test.is_gpu_available() 2023-03-17 15:10:40.014989: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2112000000 Hz 2023-03-17 15:10:40.015471: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x40210e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2023-03-17 15:10:40.015487: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2023-03-17 15:10:40.016911: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcuda.so.1 2023-03-17 15:10:40.092696: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-03-17 15:10:40.093159: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4fceae0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2023-03-17 15:10:40.093173: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3060, Compute Capability 8.6 2023-03-17 15:10:40.093261: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-03-17 15:10:40.093653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Found device 0 with properties: name: NVIDIA GeForce RTX 3060 major: 8 minor: 6 memoryClockRate(GHz): 1.777 pciBusID: 0000:01:00.0 2023-03-17 15:10:40.093668: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12 2023-03-17 15:10:40.639679: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcublas.so.12 2023-03-17 15:10:40.760581: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcufft.so.11 2023-03-17 15:10:40.844078: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcurand.so.10 2023-03-17 15:10:41.298407: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusolver.so.11 2023-03-17 15:10:41.299749: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusparse.so.12 2023-03-17 15:10:41.299981: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudnn.so.8 2023-03-17 15:10:41.300106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-03-17 15:10:41.301150: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2023-03-17 15:10:41.301913: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1802] Adding visible gpu devices: 0 2023-03-17 15:10:41.301943: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12 Traceback (most recent call last): File "", line 1, in File "/home/hjq/.local/lib/python3.8/site-packages/tensorflow_core/python/framework/test_util.py", line 1433, in is_gpu_available for local_device in device_lib.list_local_devices(): File "/home/hjq/.local/lib/python3.8/site-packages/tensorflow_core/python/client/device_lib.py", line 41, in list_local_devices for s in pywrap_tensorflow.list_devices(session_config=session_config) File "/home/hjq/.local/lib/python3.8/site-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 2249, in list_devices return ListDevices() tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

goldwater668 commented 1 year ago

After upgrading the graphics card driver, the above tensorflow-gpu can be used, but the following results appear when running the code:

2023-03-17 18:08:02.767249: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcublas.so.12 2023-03-17 18:08:03.411129: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudnn.so.8 2023-03-17 18:08:03.897246: E tensorflow/stream_executor/cuda/cuda_dnn.cc:367] Loaded runtime CuDNN library: 8.6.0 but source was compiled with: 8.7.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2023-03-17 18:08:03.898479: E tensorflow/stream_executor/cuda/cuda_dnn.cc:367] Loaded runtime CuDNN library: 8.6.0 but source was compiled with: 8.7.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2023-03-17 18:08:03.898792: E tensorflow/stream_executor/cuda/cuda_dnn.cc:367] Loaded runtime CuDNN library: 8.6.0 but source was compiled with: 8.7.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. 2023-03-17 18:08:03.899075: E tensorflow/stream_executor/cuda/cuda_dnn.cc:367] Loaded runtime CuDNN library: 8.6.0 but source was compiled with: 8.7.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration. Traceback (most recent call last): File "/home/h/.local/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call return fn(*args) File "/home/h/.local/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1349, in _run_fn return self._call_tf_sessionrun(options, feed_dict, fetch_list, File "/home/h/.local/lib/python3.8/site-packages/tensorflow_core/python/client/session.py", line 1441, in _call_tf_sessionrun return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found. (0) Unknown: Failed to get convolution execution plan. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node generator/G_MODEL/A/Conv/Conv2D}}]] [[add_16/_883]] (1) Unknown: Failed to get convolution execution plan. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node generator/G_MODEL/A/Conv/Conv2D}}]]

lileishitou commented 1 year ago

the same problem

nluehr commented 1 year ago

As the error notes:

Loaded runtime CuDNN library: 8.6.0 but source was compiled with: 8.7.0. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

Have you tried upgrading your installed cuDNN version to 8.7 or newer?

DragonGamees commented 1 year ago

I found a solution to the problem. You need to download a specific version of tensorflow for drivers. Or update nvidia drivers.

I followed the first path. Go to the website and look for the tensorflow build version for our driver and the cuda version https://docs.nvidia.com/deeplearning/frameworks/tensorflow-wheel-release-notes/tf-wheel-rel.html

Then install the new version using pip install nvidia-tensorflow==1.15.5+nv{your new version of tensorflow}