Validation kpt loss stays constant

kurenai0413 commented 8 months ago

Recently I try to print out the validation loss of my customized pose model training with MS COCO dataset and noticed the kpt loss stays constant though epoches, while other losses such as box or obj act normally.

So I go back to the original pose branch and find the kpt loss of validation set is in same

validation_loss

here is my training command:

train.py --epoch 10 --data data/coco_kpts.yaml --cfg/yolov7-w6-pose.yml --batch-size 8 --img 960 --kpt-label --sync-bn --device 0 --name yolov7-w6-pose --hyp data/hyp.pose.yaml

Codes modified in test.py to print the validation loss: line 152:

loss += compute_loss([x.float() for x in train_out], targets)[1][:6]

Add before plot to print loss:

print(('\n' + '%10s' * 7) % ('_', 'box', 'obj', 'cls', 'kpt', 'kptv' ,'total'))
print(('%10s' * 1 + '%10.4g' * 6) % ('val', *(loss.cpu() / len(dataloader)).tolist()) )

Any help is greatly appreciated.

mosama182 commented 2 months ago

I am facing the same issue. Did you figure out what was the problem?

Ethan-Lee-Sunghoon commented 1 month ago

I'm also facing the same issue.

I've printed all of the elements for validation loss.

In 'loss.py' file, lkpt is always the same every validation step.

This is because the distance d in oks loss is large during validation.

For example, targets have small scale values compared to the validation prediction values.

Target keypoints (x): tensor(3.91011, device:'cuda:0') Predicted keypoints (x): tensor(718., device:'cuda:0')

This big value leads to a large d value, which leads the exponential to zero.

Distance (d): tensor(659069.75000, device:'cuda:0')

#oks based loss d = (pkpt_x-tkpt[i][:,0::2])**2 + (pkpt_y-tkpt[i][:,1::2])**2 s = torch.prod(tbox[i][:,:-2], dim=1, keepdim=True) kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0))/torch.sum(kpt_mask != 0) lkpt += kpt_loss_factor*((1 - torch.exp(-d/(s*(4*sigmas**2)+1e-9)))*kpt_mask).mean()

This is what I found so far.

WongKinYiu / yolov7

Validation kpt loss stays constant #1949