Krasjet-Yu / YOLO-FaceV2

YOLO-FaceV2: A Scale and Occlusion Aware Face Detector
169 stars 21 forks source link

训练时遇到的错误! #5

Open sunmooncode opened 2 years ago

sunmooncode commented 2 years ago
Traceback (most recent call last):
  File "train.py", line 555, in <module>
    train(hyp, opt, device, tb_writer)
  File "train.py", line 377, in train
    is_coco=is_coco)
  File "/home/Face/YOLO-FaceV2/test.py", line 115, in test
    loss += compute_loss([x.float() for x in train_out], targets)[1][:5]  # box, obj, cls
  File "/home/Face/YOLO-FaceV2/utils/loss.py", line 224, in __call__
    dic[int(value)].append(indexs)
KeyError: 32

训练的时候双卡跑也会出现 ,单卡跑的时候就会出现上面错误!

sunmooncode commented 2 years ago

当我把batch-size设置为16的时候能够正常运行~

sunmooncode commented 2 years ago

image 训练过程中lrep会变成nan,是pytorch版本的问题嘛?

Krasjet-Yu commented 2 years ago

因为你增加了landmark损失,一些超参数需要重新调节

sunmooncode commented 2 years ago

@Krasjet-Yu image 这个是在我笔记本训练的 没有nan 同样的超参数!

上面那个keyerror有什么建议嘛

update: image 好吧 还是变nan了

Krasjet-Yu commented 2 years ago

我没试过双卡训练。我后续试一下双卡解决一下bug。不过单卡的话batch16,epoch200差不多也就一天左右吧。

sunmooncode commented 2 years ago

@Krasjet-Yu 好的 感谢 我在调一调试试~~