fire717 / movenet.pytorch

A Pytorch implementation of MoveNet from Google. Include training code and pre-trained model.
MIT License
370 stars 87 forks source link

weights for the movenet_mobilenetv3.py #19

Open mukeshnarendran7 opened 2 years ago

mukeshnarendran7 commented 2 years ago

Hi, I am not able to match the pre-trained weights for the above model file from the output path. Could you guide me on how i can use the v3 model with its pre-trained weights. Thanks

fire717 commented 2 years ago

v3 is just for test and it's acc is lower than v2 in my test, also origin movenet backbone is v2, so if u wanna use this I suggest v2. If u still wanna try v3, u can change https://github.com/fire717/movenet.pytorch/blob/bbc81408bd4da49789d912fd08635355fe123e60/lib/__init__.py#L7-L9

from lib.models.movenet_mobilenetv2 import MoveNet

to

from lib.models.movenet_mobilenetv3 import MoveNet
mukeshnarendran7 commented 2 years ago

Thanks for getting back. I have a few questions regarding the inputs and outputs. I want to try to test with my own dataset

def get_keypoints(heatmaps, thr=0.5): n, h, w = heatmaps.size() flat = heatmaps.view(n, -1) max_val, max_idx = flat.max(dim=1) xx = (max_idx % w).view(-1, 1) # (-1,1) for column vector yy = (max_idx // w).view(-1, 1) # (-1,1) for column vector xx[max_val <= thr] = -1 yy[max_val <= thr] = -1 keypoints = torch.cat((xx, yy), dim=1) keypoints = keypoints.numpy()

re-scale them back

x =  keypoints[:,0]*(640/48)
y =  keypoints[:,1]*(640/48)
return x, y
fire717 commented 2 years ago
  1. As for this repo, inputs are just : (1,3, 192, 192), its the MoveNet Lightning version . If u wanna use other size, u need to change the weight matrix, model head,etc.
  2. Refer to Official blog "MoveNet Architecture" part.
  3. Of course u can try any loss, but its not a comparison with bone loss, "this is a muti-task learning" in readme.
  4. The difference is that they are different model.
  5. Just as the code default setting.
  6. In my test, post traing quantize is not helpful, QAT may help, but I cannot find a easy way to use QAT in PyTorch(Google use Tensorflow.)

Most of ur question can be found in readme or source code, just dive it if u ara interested in MoveNet!

mukeshnarendran7 commented 2 years ago

Hey fire717, Thank you very much for the detailed reply. I stumbled across a few more doubts and just wanted to clarify

How can I fine-tune it for say 16 key points instead of 17? When I change the header to match 17 key points then it looks like the model trains from scratch and then overfits. A code example would be nice. Thanks

What are the other key points? In my case, I don't have any just individual in all images? How do I substitute this in your code for the data loader?

In the bone loss why the bone IDX is only selected for particular numbers? Say you have say 16 key points but here its less than that. Is there a relationship between these two?

def boneLoss(pred, target):
    def _Frobenius(mat1, mat2):
        return torch.pow(torch.sum(torch.pow(mat1-mat2,2)),0.5)

    _bone_idx = [[0,1],[1,2],[2,3],[3,4],[4,5],[5,6],[2,4]]

    loss = 0
    for bone_id in _bone_idx:
        bone_pre = pred[:,bone_id[0],:,:]-pred[:,bone_id[1],:,:]
        bone_gt = target[:,bone_id[0],:,:]-target[:,bone_id[1],:,:]

        f = _Frobenius(bone_pre,bone_gt)
        loss+=f

    loss = loss/len(_bone_idx)/pred.shape[0]
    return loss
fire717 commented 2 years ago

fine-tune it for 16 key points is a little complecated , cause the target of this repo is to reproduce origin movenet, if u wanna use different size or points numbers, u may read the article and the source code and understand them then change the code for urself. kps_mask is use to filter unseen points to avoid computing loss bone IDX is not important, u can try some different set to test

mukeshnarendran7 commented 2 years ago

Hey, sorry that I have a lot of questions. Thanks for the detailed repo. I tried to introduce a dummy column with zero values to match the key points from 16 ->17 and set it as not labelled.

  1. I have some issues with inputs: My format looks like this for one item. I am not sure how to set the other_keypoints and other_centers. How do you get them because my images have only one object per frame and should I set them to -1?
  2. How do you calculate the centre. I am just taking ((192//2)/192, (192//2)/192) - I have cropped my images such that the objects are at the centre of the frame
  3. I noticed in the TensorDataloader you have written:
    #print(keypoints)
    #[0.640625   0.7760417  2, ] (21,)

    Why are your key points so small (are you dividing them by 192? this works for me else I get zeros) and should they not be between (0-256/192)? What does the 21 represent - from the COCO dataset? These comments you left in the code snippets are very helpful in navigating through the code especially when you are passing in some other data for testing. Thanks

fire717 commented 2 years ago

If u have only one object, other_keypoints should be {} Yes, key points value is relative value of x, such as 0.64, absolute value = 0.64x192

As u have so many questions, I think u maybe didnot know how the MoveNet work, maybe u should read the bolg firstly, then read the source code of this repo and understand it by urself.

mukeshnarendran7 commented 2 years ago

Thanks for sharing the article. I read through it and was helpful.

However, when i pass in my data for preparation using your code label2center. My centres look like below with negative values but are positioned well. I presume they should range from 0-to 1. The max and min values are large I get back are very large or negative. I am scaling them down like the key points. My heatmaps look okay though within range and on point. Any hints on what I could be doing wrong do to correct it. Thanks

        center = [(192//2)/192, (192//2)/192]
        heatmaps,sigma = label2heatmap(keypoints, other_keypoints, self.img_size)
        cx = min(max(0,int(center[0]*self.img_size//4)),self.img_size//4-1)
        cy = min(max(0,int(center[1]*self.img_size//4)),self.img_size//4-1)

        #Denter heatmaps
        centers = label2center(cx=cx, cy=cy, other_centers=other_centers, img_size=192, sigma=sigma)#(1, 48, 48)

        regs = label2reg(keypoints, cx, cy, self.img_size) #(14, 48, 48)
        offsets = label2offset(keypoints, cx, cy, regs, self.img_size)#(14, 48, 48)
        labels = np.concatenate([heatmaps,centers,regs,offsets],axis=0)

image

image

Also, my loss values are quite large but the model seems to be learning from the movenet loss: image

fire717 commented 2 years ago

label2center just convert ur center point location (x,y) to a heatmap feature map, so if it's not right, maybe because the (x,y) for your center is not right. The center point is calculate in https://github.com/fire717/movenet.pytorch/blob/master/scripts/make_coco_data_17keypooints.py

which meaning is the center point of all keypoints of one object.