Open PeterJaq opened 3 years ago
收到,可以贴一下你的PaddlePaddle版本,PaddleDetection版本,cuda版本以及cudnn版本吗?我们定位下问题
这是我的
收到,可以贴一下你的PaddlePaddle版本,PaddleDetection版本,cuda版本以及cudnn版本吗?我们定位下问题
收到,paddleDetection版本为 v2.1.0 通过release下载的https://github.com/PaddlePaddle/PaddleDetection/releases/tag/v2.1.0 paddlepaddle 版本 2.1.0 cuda 11.2
我在训练多卡的ppyolo时出现了下列问题,尝试了3次 都会在训练了数百个epoch后出现下列问题,复现100% Traceback (most recent call last): File "tools/train.py", line 140, in
main()
File "tools/train.py", line 136, in main
run(FLAGS, cfg)
File "tools/train.py", line 111, in run
trainer.train(FLAGS.eval)
File "/usr/src/app/pd_detection/ppdet/engine/trainer.py", line 307, in train
outputs = model(data)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, kwargs)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/parallel.py", line 578, in forward
outputs = self._layers(*inputs, *kwargs)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(inputs, kwargs)
File "/usr/src/app/pd_detection/ppdet/modeling/architectures/meta_arch.py", line 27, in forward
out = self.get_loss()
File "/usr/src/app/pd_detection/ppdet/modeling/architectures/yolo.py", line 101, in get_loss
return self._forward()
File "/usr/src/app/pd_detection/ppdet/modeling/architectures/yolo.py", line 64, in _forward
neck_feats = self.neck(body_feats, self.for_mot)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, kwargs)
File "/usr/src/app/pd_detection/ppdet/modeling/necks/yolo_fpn.py", line 997, in forward
route, tip = self.fpn_blocksi
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, *kwargs)
File "/usr/src/app/pd_detection/ppdet/modeling/necks/yolo_fpn.py", line 417, in forward
conv_left = self.conv_module(conv_left)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(inputs, kwargs)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/container.py", line 97, in forward
input = layer(input)
File "/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py", line 898, in call
outputs = self.forward(*inputs, **kwargs)
File "/usr/src/app/pd_detection/ppdet/modeling/necks/yolo_fpn.py", line 205, in forward
matrix = paddle.cast(paddle.rand(x.shape, x.dtype) < gamma, x.dtype)
File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/random.py", line 722, in rand
return uniform(shape, dtype, min=0.0, max=1.0, name=name)
File "/usr/local/lib/python3.7/dist-packages/paddle/tensor/random.py", line 502, in uniform
float(max), 'seed', seed, 'dtype', dtype)
SystemError: (Fatal) Operator uniform_random raises an std::runtime_error exception.
The exception content is
:random_device::random_device(const std::string&). (at /paddle/paddle/fluid/imperative/tracer.cc:192)