PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
42.83k stars 7.7k forks source link

单机多卡训练报错 module 'paddle.fluid.libpaddle' has no attribute 'ProcessGroupNCCL' #9313

Closed 1wang11lijian1 closed 1 year ago

1wang11lijian1 commented 1 year ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

LDOUBLEV commented 1 year ago

参考这个API,验证paddle是否正确安装了https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/utils/run_check_cn.html#run-check

1wang11lijian1 commented 1 year ago

OK,谢谢你的回答,下面是我运行的结果,请问这个报错说用不了并行运算什么意思

python Python 3.7.16 (default, Jan 17 2023, 16:06:28) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32 Type "help", "copyright", "credits" or "license" for more information.

import paddle

paddle.utils.run_check() Running verify PaddlePaddle program ... W0307 10:05:31.748668 24496 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.2, Runtime API Version: 11.2 W0307 10:05:31.753669 24496 gpu_resources.cc:91] device: 0, cuDNN Version: 8.0. PaddlePaddle works well on 1 GPU. C:\ProgramData\Anaconda3\envs\PPOCR_env\lib\site-packages\paddle\fluid\executor.py:1585: UserWarning: Standalone executor is not used for data parallel UserWarning) W0307 10:05:35.871594 24496 parallel_executor.cc:666] Cannot enable P2P access from 0 to 1 W0307 10:05:35.871594 24496 parallel_executor.cc:666] Cannot enable P2P access from 1 to 0 WARNING:root:PaddlePaddle meets some problem with 2 GPUs. This may be caused by:

  1. There is not enough GPUs visible on your system
  2. Some GPUs are occupied by other process now
  3. NVIDIA-NCCL2 is not installed correctly on your system. Please follow instruction on https://github.com/NVIDIA/nccl-tests to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html WARNING:root: Original Error is: (Unavailable) Windows can support Single GPU only. [Hint: Expected device_count == 1, but received device_count:2 != 1:1.] (at ..\paddle\fluid\framework\parallel_executor.cc:1322)

PaddlePaddle is installed successfully ONLY for single GPU! Let's start deep learning with PaddlePaddle now.

LDOUBLEV commented 1 year ago

可能是你的机器没有安装NCCL,或者paddle没有找到nccl的路径; 按照提示3.检查下nccl是否正确安装了

happybear1015 commented 1 year ago

请问你解决了吗? 我也遇到了同样的问题。

1wang11lijian1 commented 1 year ago

@happybear1015 你好我查了很多资料发现多卡训练只能在linux平台下,windows平台只能进行单卡训练

OpenOneV commented 1 year ago

可能是你的机器没有安装NCCL,或者paddle没有找到nccl的路径; 按照提示3.检查下nccl是否正确安装了

@happybear1015 你好我查了很多资料发现多卡训练只能在linux平台下,windows平台只能进行单卡训练

@LDOUBLEV

请问是真的吗?单卡多机也必须要NCCL框架吗?并且因为NNCL不支持的关系,Windows没法多机训练吗? (Linux的问题是驱动支持不好,显卡奇怪一点,Linux下的显卡驱动就不支持了)

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.