PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
22.05k
stars
5.54k
forks
source link
XPU多机分布式训练报错AttributeError: module 'paddle.base.libpaddle' has no attribute 'ProcessGroupBKCL' #66335
Closed
MiltonZheng closed 1 month ago
请提出你的问题 Please ask your question
配置:两台飞腾S2500服务器,操作系统麒麟V10,各装有一张昆仑R200,编译安装的Paddle分支为release/2.6 按照官网指令运行
/usr/bin/python3 -m paddle.distributed.launch --enable_gpu_log=False --ips=10.10.5.10,10.10.5.11 test.py
python文件报错内容