PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.13k stars 5.55k forks source link

cudaErrorInvalidDeviceFunction: invalid device function @ WSL2+UBUNTU20.04 LTS 系统,P600显卡 #55718

Open wxd11011 opened 1 year ago

wxd11011 commented 1 year ago

bug描述 Describe the Bug

按照PaddleSpeech的官方示例程序:

from paddlespeech.cli.tts.infer import TTSExecutor
tts = TTSExecutor()
tts(text="今天天气十分不错。", output="output.wav")

输出以下信息:

/home/user/PaddleSpeech/tools/venv310/lib/python3.10/site-packages/lazy_loader/__init__.py:185: RuntimeWarning: subpackages can technically be lazily loaded, but it causes the package to be eagerly loaded even if it is already lazily loaded.So, you probably shouldn't use subpackages with this lazy feature.
  warnings.warn(msg, RuntimeWarning)
/home/user/PaddleSpeech/tools/venv310/lib/python3.10/site-packages/lazy_loader/__init__.py:185: RuntimeWarning: subpackages can technically be lazily loaded, but it causes the package to be eagerly loaded even if it is already lazily loaded.So, you probably shouldn't use subpackages with this lazy feature.
  warnings.warn(msg, RuntimeWarning)
/home/user/PaddleSpeech/tools/venv310/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
[2023-07-26 16:22:26,628] [    INFO] - Already cached /home/user/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
[2023-07-26 16:22:26,638] [    INFO] - tokenizer config file saved in /home/user/.paddlenlp/models/bert-base-chinese/tokenizer_config.json
[2023-07-26 16:22:26,638] [    INFO] - Special tokens file saved in /home/user/.paddlenlp/models/bert-base-chinese/special_tokens_map.json
W0726 16:22:27.132776  1303 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.
W0726 16:22:27.132830  1303 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.0, Runtime API Version: 12.0
W0726 16:22:27.133993  1303 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
I0726 16:22:27.404173  1303 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify  'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
I0726 16:22:27.405568  1303 eager_method.cc:140] Warning:: 0D Tensor cannot be used as 'Tensor.numpy()[0]' . In order to avoid this problem, 0D Tensor will be changed to 1D numpy currently, but it's not correct and will be removed in release 2.6. For Tensor contain only one element, Please modify  'Tensor.numpy()[0]' to 'float(Tensor)' as soon as possible, otherwise 'Tensor.numpy()[0]' will raise error in release 2.6.
Building prefix dict from the default dictionary ...
[2023-07-26 16:22:31,104] [   DEBUG] __init__.py:113 - Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2023-07-26 16:22:31,105] [   DEBUG] __init__.py:132 - Loading model from cache /tmp/jieba.cache
Loading model cost 0.580 seconds.
[2023-07-26 16:22:31,684] [   DEBUG] __init__.py:164 - Loading model cost 0.580 seconds.
Prefix dict has been built successfully.
[2023-07-26 16:22:31,684] [   DEBUG] __init__.py:166 - Prefix dict has been built successfully.
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  after determining tmp storage requirements for inclusive_scan: cudaErrorInvalidDeviceFunction: invalid device function
Aborted

其他补充信息 Additional Supplementary Information

系统版本 :windows10 22H2 +wsl2 UBUNTU 20.04 LTS 显卡:NVIDIA P600 驱动版本:529.11 python版本:3.10 cuda版本: 12.0 nvcc -V输出:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Mon_Oct_24_19:12:58_PDT_2022
Cuda compilation tools, release 12.0, V12.0.76
Build cuda_12.0.r12.0/compiler.31968024_0

paddlepaddle安装命令行: python3 -m pip install paddlepaddle-gpu==2.5.0.post120 -f https://www.paddlepaddle.org.cn/whl/linux/cudnnin/stable.html

paddlespeech以源码编译方式,遵照以下官方说明执行安装:

git clone https://github.com/PaddlePaddle/PaddleSpeech.git
cd PaddleSpeech
pip install pytest-runner
pip install .

paddle.utils.run_check()输出:

Running verify PaddlePaddle program ...
I0726 16:32:31.601914  1407 interpretercore.cc:237] New Executor is Running.
W0726 16:32:31.602066  1407 gpu_resources.cc:96] The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.
W0726 16:32:31.602123  1407 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 12.0, Runtime API Version: 12.0
W0726 16:32:31.602778  1407 gpu_resources.cc:149] device: 0, cuDNN Version: 8.9.
I0726 16:32:32.131875  1407 interpreter_util.cc:518] Standalone Executor is Used.
PaddlePaddle works well on 1 GPU.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
YanhuiDua commented 1 year ago

你好,看上去是CUDA库的问题,请先确认下驱动/cuda库/paddle版本是否对应

wxd11011 commented 1 year ago

@YanhuiDua 你好,我严格按照https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/compile/linux-compile-by-ninja.html的步骤安装了 CUDA12.0 包含 cuDNN 动态链接库的 PaddlePaddle 2.5.0

YanhuiDua commented 1 year ago

你的驱动装对了吗?看了下https://forums.developer.nvidia.com/t/cudalaunchkernel-returned-status-98-invalid-device-function/169958/2 官方对这个错误的回答,基本都是因为驱动有问题

YanhuiDua commented 1 year ago

nvidia-smi 能否正常输出

wxd11011 commented 1 year ago

nvidia-smi可以正确输出,WSL2 UBUNTU 20.04LTS中的输出如下:

`Wed Jul 26 17:13:00 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.125.07 Driver Version: 529.11 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro P600 On | 00000000:01:00.0 On | N/A | | 34% 36C P8 N/A / 40W | 608MiB / 2048MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 14 G /Xwayland N/A | | 0 N/A N/A 23 G /Xwayland N/A | +-----------------------------------------------------------------------------+`

宿主WINDOWS10 22H2系统中nvidia-smi的输出如下:

`+-----------------------------------------------------------------------------+ | NVIDIA-SMI 529.11 Driver Version: 529.11 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Quadro P600 WDDM | 00000000:01:00.0 On | N/A | | 34% 35C P8 N/A / 40W | 618MiB / 2048MiB | 1% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2236 C+G ...ekyb3d8bbwe\HxOutlook.exe N/A | | 0 N/A N/A 5044 C+G ...8bbwe\WindowsTerminal.exe N/A | | 0 N/A N/A 5620 C+G ...me\Application\chrome.exe N/A | | 0 N/A N/A 6520 C+G ...5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 8756 C+G ...3d8bbwe\CalculatorApp.exe N/A | | 0 N/A N/A 9672 C+G C:\Windows\explorer.exe N/A | | 0 N/A N/A 13268 C+G ...2txyewy\TextInputHost.exe N/A | | 0 N/A N/A 13784 C+G ...e\PhoneExperienceHost.exe N/A | | 0 N/A N/A 16048 C+G ...lPanel\SystemSettings.exe N/A | +-----------------------------------------------------------------------------+`

wxd11011 commented 1 year ago

我按照以下官方文档安装了驱动以及paddlepaddle指定版本的CUDA: https://docs.nvidia.com/cuda/wsl-user-guide/index.html

https://developer.nvidia.com/cuda-12-0-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=runfile_local

严格遵循以下文档的说明确定了需要安装的CUDA版本(12.0):

https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html

YanhuiDua commented 1 year ago

cuda samples可以正常运行吗

wxd11011 commented 1 year ago

似乎是正常的: deviceQuery的运行结果如下: `/usr/local/cuda-12.0/extras/demo_suite/deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro P600" CUDA Driver Version / Runtime Version 12.0 / 12.0 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 2048 MBytes (2147352576 bytes) ( 3) Multiprocessors, (128) CUDA Cores/MP: 384 CUDA Cores GPU Max Clock rate: 1557 MHz (1.56 GHz) Memory Clock rate: 2005 Mhz Memory Bus Width: 128-bit L2 Cache Size: 524288 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 5 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.0, CUDA Runtime Version = 12.0, NumDevs = 1, Device0 = Quadro P600 Result = PASS`

YanhuiDua commented 1 year ago

好的,我们看下

zhwesky2010 commented 1 year ago

这个卡应该支持,但CUDA12.0还没发布这个版本,是不是CUDA版本的问题呢

wxd11011 commented 1 year ago

这个卡应该支持,但CUDA12.0还没发布这个版本,是不是CUDA版本的问题呢

感谢您的回复,我又尝试了WSL+ubuntu20.04+CUDA11.6,同样的问题。

wxd11011 commented 1 year ago
 The GPU architecture in your current machine is Pascal, which is not compatible with Paddle installation with arch: 70 75 80 86 , it is recommended to install the corresponding wheel package according to the installation information on the official Paddle website.

这条输出似乎是说paddle目前不支持PASCAL架构?

YanhuiDua commented 1 year ago

你好,飞桨支持的 Nvidia GPU 架构请参考文档:https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#nvidia-gpu, Pascal架构是支持的,这个警告是不影响的

你的这个安装命令是python3 -m pip install paddlepaddle-gpu==2.5.0.post120 -f https://www.paddlepaddle.org.cn/whl/linux/cudnnin/stable.html, 装的是linux下的paddle,装下windows的安装包试下呢?也可以尝试下安装develop版本的paddle

wxd11011 commented 1 year ago

你好,飞桨支持的 Nvidia GPU 架构请参考文档:https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#nvidia-gpu, Pascal架构是支持的,这个警告是不影响的

你的这个安装命令是python3 -m pip install paddlepaddle-gpu==2.5.0.post120 -f https://www.paddlepaddle.org.cn/whl/linux/cudnnin/stable.html, 装的是linux下的paddle,装下windows的安装包试下呢?也可以尝试下安装develop版本的paddle

谢谢您的回复! 因为我是在wsl2+ubuntu的环境下,所以要安装linux版本的。但是同样的问题也出现在了windows10 22H2+P600显卡+ cuda 11.8+paddlepaddle2.5

wxd11011 commented 1 year ago

你好,飞桨支持的 Nvidia GPU 架构请参考文档:https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#nvidia-gpu, Pascal架构是支持的,这个警告是不影响的

你的这个安装命令是python3 -m pip install paddlepaddle-gpu==2.5.0.post120 -f https://www.paddlepaddle.org.cn/whl/linux/cudnnin/stable.html, 装的是linux下的paddle,装下windows的安装包试下呢?也可以尝试下安装develop版本的paddle

经测试,develop版本的paddle配合CUDA11.8+CUDNN8.9可以正常运行。但是与CUDA12配合就不行,安装正常,import导入时会报错。