Closed ToscanaGoGithub closed 1 year ago
您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快~
Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day!
python版本: Python 3.8.8 (default, Apr 13 2021, 19:58:26) [GCC 7.3.0] :: Anaconda, Inc. on linux
paddle 版本: paddlepaddle-gpu==2.2.1.post111
cudnn版本是多少呢?
在运行之前,先设置环境变量export FLAGS_call_stack_level=2
,然后运行,看一下具体的报错代码
zhangwei@2080s:~$ export FLAGS_call_stack_level=2 zhangwei@2080s:~$ python Python 3.8.8 (default, Apr 13 2021, 19:58:26) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information.
import paddle paddle.utils.run_check() Running verify PaddlePaddle program ... W1220 16:59:03.243978 390281 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.4, Runtime API Version: 11.2
No stack trace in paddle, may be caused by external reasons.
FatalError: Segmentation fault
is detected by the operating system.
[TimeInfo: Aborted at 1639990743 (unix time) try "date -d @1639990743" if you are using GNU date ]
[SignalInfo: SIGSEGV (@0x0) received by PID 390281 (TID 0x7f8ee93c9740) from PID 0 ]
段错误 (核心已转储) zhangwei@2080s:~$
设置之后,执行时还是同样的问题
import torch print(torch.backends.cudnn.version()) 8005
通过pytorch查询是8005,但是通过以下语句查询时,无打印结果 cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
换其他版本的paddle试试呢?
几个版本都尝试过,都是报同样的错误
使用cpu版本也会报错吗?
使用cpu版本也会报错吗?
import paddle paddle.utils.run_check() Running verify PaddlePaddle program ... PaddlePaddle works well on 1 CPU. W1221 10:40:38.772401 430183 fuse_all_reduce_op_pass.cc:76] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 2. PaddlePaddle works well on 2 CPUs. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
试了一下 CPU的没问题
感觉应该是cuda、cudnn等环境配置问题,这样就不太好定位了。可以使用一下paddle官网的docker环境
感觉应该是cuda、cudnn等环境配置问题,这样就不太好定位了。可以使用一下paddle官网的docker环境
好的 感谢回复
系统版本: Distributor ID: Ubuntu Description: Ubuntu 20.04.3 LTS Release: 20.04 Codename: focal
linux 什么版本: linux-image-5.11.0-27-generic
cuda版本: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2020 NVIDIA Corporation Built on Mon_Oct_12_20:09:46_PDT_2020 Cuda compilation tools, release 11.1, V11.1.105 Build cuda_11.1.TC455_06.29190527_0
显卡驱动版本: nvidia-dkms-470 470.86-0ubuntu0.20.04.1 amd64 NVIDIA DKMS package nvidia-driver-470 470.86-0ubuntu0.20.04.1 amd64 NVIDIA driver metapackage
执行 paddle.utils.run_check() 时报错信息: Running verify PaddlePaddle program ... W1219 20:07:08.779233 31712 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.4, Runtime API Version: 11.2
C++ Traceback (most recent call last):
No stack trace in paddle, may be caused by external reasons.
Error Message Summary:
FatalError:
Segmentation fault
is detected by the operating system. [TimeInfo: Aborted at 1639915628 (unix time) try "date -d @1639915628" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x0) received by PID 31712 (TID 0x7f2d1f1c4740) from PID 0 ]段错误 (核心已转储)