TexasInstruments / edgeai-yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite. Forked from https://ultralytics.com/yolov5
https://github.com/TexasInstruments/edgeai
GNU General Public License v3.0
649 stars 120 forks source link

Some of the key points are severely shifted #47

Open zengyao1999 opened 2 years ago

zengyao1999 commented 2 years ago

Hi,@debapriyamaji I got this result after retraining the coco dataset with the pre-trained model, and only these two key points will have a very serious offset, I think there could be some relationship with sigmas, I had a similar problem when training my own data, hope I can get your answer, thanks! 000000002473 image

SashaAlderson commented 2 years ago

Should be fixed with this PR https://github.com/TexasInstruments/edgeai-yolov5/pull/48

debapriyamaji commented 2 years ago

@SashaAlderson There is no problem with the loss function. I have retrained the model and they are working fine. I will update the models in a few days.

SashaAlderson commented 2 years ago

@SashaAlderson There is no problem with the loss function. I have retrained the model and they are working fine. I will update the models in a few days.

Ok, but changing loss function solved this issue for me. What was the problem?

debapriyamaji commented 2 years ago

@SashaAlderson I trained those models long back and didn't check them before releasing. Not sure what exactly went wrong.

You can try any of these models. They don't have any such issues. https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose#yolov5-ti-lite-based-models-and-ckpts

You can retry training without making any changes. That should work as well.

I cannot see any change in loss function in the PR. Did you refer to the change in activation?

SashaAlderson commented 2 years ago

@SashaAlderson I trained those models long back and didn't check them before releasing. Not sure what exactly went wrong.

You can try any of these models. They don't have any such issues. https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose#yolov5-ti-lite-based-models-and-ckpts

You can retry training without making any changes. That should work as well.

I cannot see any change in loss function in the PR. Did you refer to the change in activation?

I changed denominator of exponent from 4s sigmas^2 to 2(s sigmas)^2, so that scale will affect training more. Worked for me.

zengyao1999 commented 2 years ago

@SashaAlderson I used your method, the key point offset phenomenon disappeared, thank you very much!

shantzhou commented 2 years ago

lkpt += kpt_loss_factor((1 - torch.exp(-d/(s(4*sigmas*2)+1e-9)))kpt_mask).mean() its actually affect the offset of keypoint

@SashaAlderson There is no problem with the loss function. I have retrained the model and they are working fine. I will update the models in a few days.

zengyao1999 commented 2 years ago

lkpt += kpt_loss_factor((1 - torch.exp(-d/(s(4*sigmas*2)+1e-9)))kpt_mask).mean() 它实际上会影响关键点的偏移量

@SashaAlderson损失函数没有问题。我已经重新训练了模型,它们工作正常。我将在几天内更新模型。

工大大佬好!

maketo97 commented 2 years ago

@zengyao1999 May I know which coding files and lines should I modifed to train the model on custom keypoints?

tctco commented 2 years ago

@SashaAlderson Your solution works! Thanks!!!

But I am wondering why it works :( The yolox version of YOLO-Pose with the default loss function works fine, but there is a severe key points shift problem in the yolov5 version.

Blankit commented 1 year ago

Should be fixed with this PR #48

Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased. image

Blankit commented 1 year ago

Should be fixed with this PR #48

Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased. image

Before changing computing loss line, the precision, recall and val loss were abnormal.

Jackqu commented 1 year ago

@SashaAlderson Also solved my problem, thanks!

tctco commented 1 year ago

@SashaAlderson The original implementation of OKS is good, but I don't know why there are shifted points. By changing 1 - torch.exp(-d/(s*(4*sigmas**2)+1e-9)) to 1 - torch.exp(-d/(s**1.1*(4*sigmas**2)+1e-9)) also solved the problem.

RustyShackleford73 commented 1 year ago

I think you could try to train these joints with higher sigmas. I found it shift severely too for joint 1 and 3 in the very early epochs like 3 or 4. And I edited the sigmas and it seems work

wkt commented 1 year ago

this happen to me too when I train coco without any pretrain weights I change the loss to: lkpt += kpt_loss_factor( ( (1 - torch.exp(-d/(s(4*sigmas*2)+1e-9))) + 0.05d)*kpt_mask).mean() and it fix.

nomaad42 commented 1 year ago

this happen to me too when I train coco without any pretrain weights I change the loss to: lkpt += kpt_loss_factor( ( (1 - torch.exp(-d/(s(4_sigmas*2)+1e-9))) + 0.05_d)kpt_mask).mean() and it fix.

Hi! @wkt

What sigma values did you choose using this loss?

zay95 commented 1 year ago

I encountered same problem , the shifted keypoints occured in yolov7 project code .Points shifted to (0,0) or (x,0) in th image. And the loss curve is the same as the above. Changing the loss function can't work, so confused

nomaad42 commented 1 year ago

I encountered same problem , the shifted keypoints occured in yolov7 project code .Points shifted to (0,0) or (x,0) in th image. And the loss curve is the same as the above. Changing the loss function can't work, so confused

Hi! @zay95

I used this sigmas

sigmas = torch.tensor([.71, .73, .88, .77, .76, .79, .79, .72, .72, .87], device=device) / 10.0

and this loss function

lkpt += kpt_loss_factor(((1 - torch.exp(-d/(s(4*sigmas*2)+1e-9))) + 0.05d)*kpt_mask).mean()

It seemed to help a bit

zay95 commented 1 year ago

Thank you very much for your suggestion. I want to know if this is the weight of 10 key points. If it has 17 points, is it still use the weight value calculated by the coco data set ? Source sigmas equals to torch.tensor([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89], device=device) / 10.0. And when I train on my dateset, if I change the original 640 input size to a smaller one(512 etc..) it seems that loss will reach Nan quickly.

wkt commented 1 year ago

this happen to me too when I train coco without any pretrain weights I change the loss to: lkpt += kpt_loss_factor( ( (1 - torch.exp(-d/(s(4_sigmas*2)+1e-9))) + 0.05_d)kpt_mask).mean() and it fix.

Hi! @wkt

What sigma values did you choose using this loss?

I did not change the sigma values.

nomaad42 commented 1 year ago

this happen to me too when I train coco without any pretrain weights I change the loss to: lkpt += kpt_loss_factor( ( (1 - torch.exp(-d/(s(4_sigmas*2)+1e-9))) + 0.05_d)kpt_mask).mean() and it fix.

Hi! @wkt What sigma values did you choose using this loss?

I did not change the sigma values.

Thank you!

nomaad42 commented 1 year ago

Thank you very much for your suggestion. I want to know if this is the weight of 10 key points. If it has 17 points, is it still use the weight value calculated by the coco data set ? Source sigmas equals to torch.tensor([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89], device=device) / 10.0. And when I train on my dateset, if I change the original 640 input size to a smaller one(512 etc..) it seems that loss will reach Nan quickly.

@zay95 Try adding some little float number to kpt_loss_factor calculation denominator (like this):

kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0)) / (torch.sum(kpt_mask != 0) + 0.01)

zay95 commented 1 year ago

Thank you very much for your suggestion. I want to know if this is the weight of 10 key points. If it has 17 points, is it still use the weight value calculated by the coco data set ? Source sigmas equals to torch.tensor([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89], device=device) / 10.0. And when I train on my dateset, if I change the original 640 input size to a smaller one(512 etc..) it seems that loss will reach Nan quickly.

@zay95 Try adding some little float number to kpt_loss_factor calculation denominator (like this):

kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0)) / (torch.sum(kpt_mask != 0) + 0.01)

enen, I will try,thanks.

nomaad42 commented 1 year ago

@zay95

What sigma values do you use?

zay95 commented 1 year ago

@nomaad42 I used source coco 17 sigma values: torch.tensor([.26, .25, .25, .35, .35, .79, .79, .72, .72, .62, .62, 1.07, 1.07, .87, .87, .89, .89], device=device) / 10.0.

gadewegit commented 1 year ago

Should be fixed with this PR #48

Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased. image

Hello, how will it be solved and looking forward to your reply

yongguanjiangshan commented 1 year ago

Should be fixed with this PR #48

Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased. image

Hello, how will it be solved and looking forward to your reply

hi, you can refer here #110

yongguanjiangshan commented 1 year ago

Should be fixed with this PR #48

Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased. image

Before changing computing loss line, the precision, recall and val loss were abnormal.

hi, you can refer here #110

zay95 commented 1 year ago

Should be fixed with this PR #48

Thanks for your contribution. It works for points out of bounding box. But the precision and recall decreased and val loss increased. image

I found that this project doesn't complete the loss curve in keypoints predicting process. Actually, current curve represents as blows:

image

Of course ,you can add what you want to show in the code train.py line 378--line411.

image
K-tang-mkv commented 1 year ago

@SashaAlderson I trained those models long back and didn't check them before releasing. Not sure what exactly went wrong. You can try any of these models. They don't have any such issues. https://github.com/TexasInstruments/edgeai-yolov5/tree/yolo-pose#yolov5-ti-lite-based-models-and-ckpts You can retry training without making any changes. That should work as well. I cannot see any change in loss function in the PR. Did you refer to the change in activation?

I changed denominator of exponent from 4s sigmas^2 to 2(s sigmas)^2, so that scale will affect training more. Worked for me.

Actually, the loss you change 2(s*sigmas)^2 is not correct. From the standard definition of coco official, OKS = Σi[exp(-di^2/2s^2*Ki^2)δ(vi>0)] / Σi[δ(vi>0)], here s which we define as the square root of the object segment area, so from the original code

 s = torch.prod(tbox[i][:,-2:], dim=1, keepdim=True)

 kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0))/torch.sum(kpt_mask != 0)

 lkpt += kpt_loss_factor*((1 - torch.exp(-d/(s*(4*sigmas**2)+1e-9)))*kpt_mask).mean()

here s is already square, so you can't square it again, and Ki=2*sigmas, so Ki^2=4*sigmas, there is little problem with the original code , i.e., the 2 is missed. So the correct denominator of exponent should be 2*s*4*sigmas^2