Closed Han-YLun closed 2 years ago
您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快~
Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day!
nvidia-smi
Tue Apr 12 18:39:25 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:02:00.0 Off | N/A |
| 0% 49C P8 12W / 250W | 0MiB / 11178MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:03:00.0 Off | N/A |
| 0% 50C P8 22W / 250W | 0MiB / 11177MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
W0412 10:46:10.999380 3804 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 6.1, Driver API Version: 11.4, Runtime API Version: 10.2
W0412 10:46:11.128330 3804 device_context.cc:372] device: 0, cuDNN Version: 8.1.
PaddlePaddle works well on 1 GPU.
2022-04-12 10:46:18,538 - WARNING - PaddlePaddle meets some problem with 2 GPUs. This may be caused by:
1. There is not enough GPUs visible on your system
2. Some GPUs are occupied by other process now
3. NVIDIA-NCCL2 is not installed correctly on your system. Please follow instruction on https://github.com/NVIDIA/nccl-tests
to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
2022-04-12 10:46:18,538 - WARNING -
Original Error is: (External) Nccl error, unhandled system error (at /paddle/paddle/fluid/platform/nccl_helper.h:118)
PaddlePaddle is installed successfully ONLY for single GPU! Let's start deep learning with PaddlePaddle now.
nccl看起来有问题,可以尝试单卡运行程序
怎么单卡运行程序
我使用--runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=1 创建docker容器时指定了单卡,但是还是这个问题
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 paddle::framework::SignalHandle(char const*, int)
1 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
[TimeInfo: *** Aborted at 1649764241 (unix time) try "date -d @1649764241" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x0) received by PID 1038 (TID 0x7fe62d9b3700) from PID 0 ***]
paddlegpu版本不對: post後面是cuda版本 python -m pip install paddlepaddle-gpu==2.1.1.post112 -f https://paddlepaddle.org.cn/whl/mkl/stable.html
實在不行去 https://www.paddlepaddle.org.cn/whl/mkl/stable.html 下2.0.2版本手動裝
使用这个解决了问题
报错信息:
环境: 使用的docker,镜像为registry.baidubce.com/paddlepaddle/paddle:2.2.2-gpu-cuda11.2-cudnn8 paddleocr==2.0.1 paddlepaddle-gpu==2.0.1