Open 535205856 opened 1 year ago
把sync_batch_norm关掉试一下能跑嘛
把sync_batch_norm关掉试一下能跑嘛
PaddleCustomDevice 方面的回答说是npu不支持sync_bn算子, 改为普通bn算子, 我这边使用的是paddleDetection v2.6.0, 这个tag下的没有改,在release/2.6 下面是改了的,没想到这两个还不一样,,想问问paddlepaddle 的各个子类包detection检测对npu的支持到什么地步了,里面的网络模型的哪些算子是支持npu吗,哪些是不支持的,支持的算子的性能都是如何的,其中哪些网络模型是确定支持上没问题的,有没有哪里能详细看到这样的报告或者说明的
支持的情况只能咨询PaddleCustomDevice,, 套件也是用户
问题确认 Search before asking
Bug组件 Bug Component
Training
Bug描述 Describe the Bug
单机多卡报错 python -m paddle.distributed.fleet.launch --run_mode=collective --npus="4,5,6,7" tools/train.py -c configs/yolov3/yolov3_darknet53_270e_roadsign.yml -o use_npu=True
单机单卡可以训练
单机多卡训练异常报错,
Traceback (most recent call last): File "tools/train.py", line 205, in main() File "tools/train.py", line 201, in main run(FLAGS, cfg) File "tools/train.py", line 151, in run trainer.train(FLAGS.eval) File "/workspace/PaddleDetection/ppdet/engine/trainer.py", line 539, in train outputs = model(data) File "/opt/py37env/lib/python3.7/site-packages/paddle/nn/layer/layers.py", line 1253, in call return self.forward(*inputs, kwargs) File "/opt/py37env/lib/python3.7/site-packages/paddle/distributed/parallel.py", line 534, in forward outputs = self._layers(*inputs, *kwargs) File "/opt/py37env/lib/python3.7/site-packages/paddle/nn/layer/layers.py", line 1253, in call return self.forward(inputs, kwargs) File "/workspace/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 60, in forward out = self.get_loss() File "/workspace/PaddleDetection/ppdet/modeling/architectures/yolo.py", line 147, in get_loss return self.forward() File "/workspace/PaddleDetection/ppdet/modeling/architectures/yolo.py", line 81, in forward body_feats = self.backbone(self.inputs) File "/opt/py37env/lib/python3.7/site-packages/paddle/nn/layer/layers.py", line 1253, in call return self.forward(*inputs, kwargs) File "/workspace/PaddleDetection/ppdet/modeling/backbones/darknet.py", line 330, in forward out = self.conv0(x) File "/opt/py37env/lib/python3.7/site-packages/paddle/nn/layer/layers.py", line 1253, in call return self.forward(*inputs, *kwargs) File "/workspace/PaddleDetection/ppdet/modeling/backbones/darknet.py", line 77, in forward out = self.batch_norm(out) File "/opt/py37env/lib/python3.7/site-packages/paddle/nn/layer/layers.py", line 1253, in call return self.forward(inputs, kwargs) File "/opt/py37env/lib/python3.7/site-packages/paddle/nn/layer/norm.py", line 1557, in forward False, RuntimeError: (NotFound) The kernel sync_batch_norm is not registered. [Hint: Expected iter != kernels.end(), but received iter == kernels.end().] (at /paddle/paddle/phi/core/kernel_factory.cc:219)
复现环境 Environment
宿主机机器环境是 昇腾910npu + 鲲鹏920 arm cpu 的 ubuntu 环境 镜像使用是npu文档中的镜像 registry.baidubce.com/device/paddle-npu:cann601-ubuntu18-aarch64-gcc82
Bug描述确认 Bug description confirmation
是否愿意提交PR? Are you willing to submit a PR?