Open zengyao1999 opened 2 years ago
Should be fixed with this PR https://github.com/TexasInstruments/edgeai-yolov5/pull/48
@SashaAlderson There is no problem with the loss function. I have retrained the model and they are working fine. I will update the models in a few days.
@SashaAlderson There is no problem with the loss function. I have retrained the model and they are working fine. I will update the models in a few days.
Ok, but changing loss function solved this issue for me. What was the problem?
@SashaAlderson I trained those models long back and didn't check them before releasing. Not sure what exactly went wrong.
You can try any of these models. They don't have any such issues. https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose#yolov5-ti-lite-based-models-and-ckpts
You can retry training without making any changes. That should work as well.
I cannot see any change in loss function in the PR. Did you refer to the change in activation?
@SashaAlderson I trained those models long back and didn't check them before releasing. Not sure what exactly went wrong.
You can try any of these models. They don't have any such issues. https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose#yolov5-ti-lite-based-models-and-ckpts
You can retry training without making any changes. That should work as well.
I cannot see any change in loss function in the PR. Did you refer to the change in activation?
I changed denominator of exponent from 4s sigmas^2 to 2(s sigmas)^2, so that scale will affect training more. Worked for me.
@SashaAlderson I used your method, the key point offset phenomenon disappeared, thank you very much!
lkpt += kpt_loss_factor((1 - torch.exp(-d/(s(4*sigmas*2)+1e-9)))kpt_mask).mean() its actually affect the offset of keypoint
@SashaAlderson There is no problem with the loss function. I have retrained the model and they are working fine. I will update the models in a few days.
lkpt += kpt_loss_factor((1 - torch.exp(-d/(s(4*sigmas*2)+1e-9)))kpt_mask).mean() 它实际上会影响关键点的偏移量
@SashaAlderson损失函数没有问题。我已经重新训练了模型,它们工作正常。我将在几天内更新模型。
工大大佬好!
@zengyao1999 May I know which coding files and lines should I modifed to train the model on custom keypoints?
@SashaAlderson Your solution works! Thanks!!!
But I am wondering why it works :( The yolox version of YOLO-Pose with the default loss function works fine, but there is a severe key points shift problem in the yolov5 version.
Should be fixed with this PR #48
Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased.
Should be fixed with this PR #48
Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased.
Before changing computing loss line, the precision, recall and val loss were abnormal.
@SashaAlderson Also solved my problem, thanks!
@SashaAlderson The original implementation of OKS is good, but I don't know why there are shifted points. By changing 1 - torch.exp(-d/(s*(4*sigmas**2)+1e-9))
to 1 - torch.exp(-d/(s**1.1*(4*sigmas**2)+1e-9))
also solved the problem.
I think you could try to train these joints with higher sigmas. I found it shift severely too for joint 1 and 3 in the very early epochs like 3 or 4. And I edited the sigmas and it seems work
this happen to me too when I train coco without any pretrain weights I change the loss to: lkpt += kpt_loss_factor( ( (1 - torch.exp(-d/(s(4*sigmas*2)+1e-9))) + 0.05d)*kpt_mask).mean() and it fix.
this happen to me too when I train coco without any pretrain weights I change the loss to: lkpt += kpt_loss_factor( ( (1 - torch.exp(-d/(s(4_sigmas*2)+1e-9))) + 0.05_d)kpt_mask).mean() and it fix.
Hi! @wkt
What sigma values did you choose using this loss?
I encountered same problem , the shifted keypoints occured in yolov7 project code .Points shifted to (0,0) or (x,0) in th image. And the loss curve is the same as the above. Changing the loss function can't work, so confused
I encountered same problem , the shifted keypoints occured in yolov7 project code .Points shifted to (0,0) or (x,0) in th image. And the loss curve is the same as the above. Changing the loss function can't work, so confused
Hi! @zay95
I used this sigmas
sigmas = torch.tensor([.71, .73, .88, .77, .76, .79, .79, .72, .72, .87], device=device) / 10.0
and this loss function
lkpt += kpt_loss_factor(((1 - torch.exp(-d/(s(4*sigmas*2)+1e-9))) + 0.05d)*kpt_mask).mean()
It seemed to help a bit
Thank you very much for your suggestion. I want to know if this is the weight of 10 key points. If it has 17 points, is it still use the weight value calculated by the coco data set ? Source sigmas equals to torch.tensor([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89], device=device) / 10.0. And when I train on my dateset, if I change the original 640 input size to a smaller one(512 etc..) it seems that loss will reach Nan quickly.
this happen to me too when I train coco without any pretrain weights I change the loss to: lkpt += kpt_loss_factor( ( (1 - torch.exp(-d/(s(4_sigmas*2)+1e-9))) + 0.05_d)kpt_mask).mean() and it fix.
Hi! @wkt
What sigma values did you choose using this loss?
I did not change the sigma values.
this happen to me too when I train coco without any pretrain weights I change the loss to: lkpt += kpt_loss_factor( ( (1 - torch.exp(-d/(s(4_sigmas*2)+1e-9))) + 0.05_d)kpt_mask).mean() and it fix.
Hi! @wkt What sigma values did you choose using this loss?
I did not change the sigma values.
Thank you!
Thank you very much for your suggestion. I want to know if this is the weight of 10 key points. If it has 17 points, is it still use the weight value calculated by the coco data set ? Source sigmas equals to torch.tensor([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89], device=device) / 10.0. And when I train on my dateset, if I change the original 640 input size to a smaller one(512 etc..) it seems that loss will reach Nan quickly.
@zay95 Try adding some little float number to kpt_loss_factor calculation denominator (like this):
kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0)) / (torch.sum(kpt_mask != 0) + 0.01)
Thank you very much for your suggestion. I want to know if this is the weight of 10 key points. If it has 17 points, is it still use the weight value calculated by the coco data set ? Source sigmas equals to torch.tensor([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89], device=device) / 10.0. And when I train on my dateset, if I change the original 640 input size to a smaller one(512 etc..) it seems that loss will reach Nan quickly.
@zay95 Try adding some little float number to kpt_loss_factor calculation denominator (like this):
kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0)) / (torch.sum(kpt_mask != 0) + 0.01)
enen, I will try,thanks.
@zay95
What sigma values do you use?
@nomaad42 I used source coco 17 sigma values: torch.tensor([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89], device=device) / 10.0.
Should be fixed with this PR #48
Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased.
Hello, how will it be solved and looking forward to your reply
Should be fixed with this PR #48
Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased.
Hello, how will it be solved and looking forward to your reply
hi, you can refer here #110
Should be fixed with this PR #48
Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased.
Before changing computing loss line, the precision, recall and val loss were abnormal.
hi, you can refer here #110
Should be fixed with this PR #48
Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased.
I found that this project doesn't complete the loss curve in keypoints predicting process. Actually, current curve represents as blows:
Of course ,you can add what you want to show in the code train.py line 378--line411.
@SashaAlderson I trained those models long back and didn't check them before releasing. Not sure what exactly went wrong. You can try any of these models. They don't have any such issues. https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose#yolov5-ti-lite-based-models-and-ckpts You can retry training without making any changes. That should work as well. I cannot see any change in loss function in the PR. Did you refer to the change in activation?
I changed denominator of exponent from 4s sigmas^2 to 2(s sigmas)^2, so that scale will affect training more. Worked for me.
Actually, the loss you change 2(s*sigmas)^2
is not correct. From the standard definition of coco official, OKS = Σi[exp(-di^2/2s^2*Ki^2)δ(vi>0)] / Σi[δ(vi>0)]
, here s which we define as the square root of the object segment area
, so from the original code
s = torch.prod(tbox[i][:,-2:], dim=1, keepdim=True)
kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0))/torch.sum(kpt_mask != 0)
lkpt += kpt_loss_factor*((1 - torch.exp(-d/(s*(4*sigmas**2)+1e-9)))*kpt_mask).mean()
here s
is already square, so you can't square it again, and Ki=2*sigmas
, so Ki^2=4*sigmas
, there is little problem with the original code , i.e., the 2
is missed. So the correct denominator of exponent should be 2*s*4*sigmas^2
Hi,@debapriyamaji I got this result after retraining the coco dataset with the pre-trained model, and only these two key points will have a very serious offset, I think there could be some relationship with sigmas, I had a similar problem when training my own data, hope I can get your answer, thanks!