Open kurenai0413 opened 8 months ago
I am facing the same issue. Did you figure out what was the problem?
I'm also facing the same issue.
I've printed all of the elements for validation loss.
In 'loss.py' file, lkpt is always the same every validation step.
This is because the distance d in oks loss is large during validation.
For example, targets have small scale values compared to the validation prediction values.
Target keypoints (x): tensor(3.91011, device:'cuda:0') Predicted keypoints (x): tensor(718., device:'cuda:0')
This big value leads to a large d value, which leads the exponential to zero.
Distance (d): tensor(659069.75000, device:'cuda:0')
#oks based loss d = (pkpt_x-tkpt[i][:,0::2])**2 + (pkpt_y-tkpt[i][:,1::2])**2 s = torch.prod(tbox[i][:,:-2], dim=1, keepdim=True) kpt_loss_factor = (torch.sum(kpt_mask != 0) + torch.sum(kpt_mask == 0))/torch.sum(kpt_mask != 0) lkpt += kpt_loss_factor*((1 - torch.exp(-d/(s*(4*sigmas**2)+1e-9)))*kpt_mask).mean()
This is what I found so far.
Recently I try to print out the validation loss of my customized pose model training with MS COCO dataset and noticed the kpt loss stays constant though epoches, while other losses such as box or obj act normally.
So I go back to the original pose branch and find the kpt loss of validation set is in same
here is my training command:
Codes modified in test.py to print the validation loss: line 152:
Add before plot to print loss:
Any help is greatly appreciated.