Tianxiaomo / pytorch-YOLOv4

PyTorch ,ONNX and TensorRT implementation of YOLOv4
Apache License 2.0
4.48k stars 1.49k forks source link

训练时遇到的问题 #34

Closed zh9369 closed 4 years ago

zh9369 commented 4 years ago

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4, 3, 19, 19, 85]], which is output 0 of AsStridedBackward, is at version 6; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). 我训练自己的数据集遇到这个问题,之前重来没有遇到过,请问有解决办法吗?

karmueo commented 4 years ago

我也遇到这个问题了,有解决方法么?

Tianxiaomo commented 4 years ago

可以把完整的错误信息贴出来吧 @zh9369 @karmueo

zh9369 commented 4 years ago

Traceback (most recent call last): File "C:/Users/the_moon/Desktop/python/YOLOV/yolov4/pytorch-YOLOv4-master/train.py", line 428, in device=device, ) File "C:/Users/the_moon/Desktop/python/YOLOV/yolov4/pytorch-YOLOv4-master/train.py", line 308, in train loss.backward() File "C:\Users\the_moon\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\tensor.py", line 118, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users\the_moon\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\autograd__init__.py", line 93, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4, 3, 19, 19, 85]], which is output 0 of AsStridedBackward, is at version 6; expected version 3 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Process finished with exit code 1

这是完整的报错信息

Tianxiaomo commented 4 years ago

@zh9369 pytorch 是哪个版本的

zh9369 commented 4 years ago

好像可以了,经过你的提醒我才发现我的是1.2的版本,升级到1.5版本也可以训练,期待后续结果,谢谢你的帮助!

karmueo commented 4 years ago

同样的问题,也是升级到1.5后解决,感谢两位的帮助.

jiaoxiaosong commented 3 years ago

具体是代码里哪里导致的?

Sukeysun commented 3 years ago

具体是代码里哪里导致的?

是train.py的YOLO_Loss导致出现这个问题的。output在YOLO_LOSS里面被修改了,反向更新的时候就出问题了。可以创建一个跟output相同尺寸的变量,然后将修改后的值写到这个变量里面 就可以解决这个问题了