fire717 / movenet.pytorch

A Pytorch implementation of MoveNet from Google. Include training code and pre-trained model.
MIT License
374 stars 87 forks source link

Output rendering inconsistency with API #25

Closed leftbackn3 closed 2 years ago

leftbackn3 commented 2 years ago

HI,

While using the tensorflow api, input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() interpreter.set_tensor(input_details[0]['index'], np.array(input_image)) interpreter.invoke() print( interpreter.get_tensor(output_details[0]['index']).shape)

The output is 1,1,17,3 that is 17 keypoints and their respective y-x, coordinates and confidence scores

I used the pretrained model in this repositories output folder called e91_valacc0.79763.pth converted to onnx then to tf and finally to tflite. The only change I made in pth2onnx.py is opset_version=10

torch.onnx.export(run_task.model, dummy_input1, "output/pose.onnx", verbose=True, input_names=input_names, output_names=output_names, do_constant_folding=True,opset_version=10)

now i did the same for the tflite model- that is the first 5 lines of code. The output shape is 1,34,48,48.

How can I obtain the output in the same format as that we receive while using the API?

fire717 commented 2 years ago

The officail tflite model merge post-process ops into model. so it output final points. As said in readme: ...which cannot be converted to some CPU inference framework such as NCNN,Tengine,MNN,TNN(all of these are faster than tflite framework.)...

This pytorch model only ouput the original model outputs, and do post-process in code, so it can be easily deployed by any other inference framework.

So if u wanna use tflite, why not just use the official pre-trained model?

leftbackn3 commented 2 years ago

The output that we are receiving from this model,has the shape 1,34,48,48

Does 34 correspond to 17 keypoints at x and y coordinates respectively?

Is the shape of the image output 48*48? My purpose is to avoid the using api altogether because I want to train the model on custom datasets. So that is why I am avoiding the official implementation for now. On Sat, 4 Jun, 2022, 09:49 Mr.Fire, @.***> wrote:

The officail tflite model merge post-process ops into model. so it output final points. As said in readme: ...which cannot be converted to some CPU inference framework such as NCNN,Tengine,MNN,TNN(all of this is faster than tflite framework.), and we can not add our own custom data to finetune.

This pytorch model only ouput the original model outputs, and do post-process in code, so it can be easily deployed by any other inference framework.

— Reply to this email directly, view it on GitHub https://github.com/fire717/movenet.pytorch/issues/25#issuecomment-1146526624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJGDO76DNUPOV63M6IGWQ6TVNLKMPANCNFSM5XYHJYTA . You are receiving this because you authored the thread.Message ID: @.***>

fire717 commented 2 years ago

In fact the model has four output tensor, [1,34,48,48] is just one of them. Sure it is " 34 correspond to 17 keypoints at x and y coordinates respectively", and 48x48 is 1/4 downsample of origin image which is 192x192, but the result need more post-process with other three outputs to get more accurate result.

I think u can read this function, which transfer model original output to points. https://github.com/fire717/movenet.pytorch/blob/95ec8535245228aa4335243e68722810e50bcaf8/lib/task/task_tools.py#L85

leftbackn3 commented 2 years ago

Thanks for the suggestion.

I tried implementing the same. The output that Im receving from the interpreter.get_output_details is of the format- (1,34,48,48),(1,17,48,48),(1,34,48,48),(1,1,48,48) (as u had mentioned earlier)

I made a list of numpy ndarrays with the name data where- data[0]-(1,34,48,48) data[1]-(1,17,48,48) and so on

and then used movenetDecode(data)

Since this is already a numpy array,so .detach().cpu().numpy() has been removed accordingly in the modifications that I made

The problem I encountered recently is the batch_size.

data [64, 7, 48, 48] [64, 1, 48, 48] [64, 14, 48, 48] [64, 14, 48, 48]

#kps_mask [n, 7]
if mode == 'output':
    batch_size = data[0].shape[0]

instead of size(0) i used shape[0] since this is not a tensor its directly a numpy ndarray. error received- reg_x_origin = (regs[dim0,dim1+n*2,cy,cx]+0.5).astype(np.int32) IndexError: index 1 is out of bounds for axis 0 with size 1

question- should the output be a batch of 64 images obtained from the interpreter.get_output_details? or is it something else that Im missing out?

fire717 commented 2 years ago

Maybe the data sequence should be [1,17,48,48],[1,1,48,48],[,1,34,48,48],[1,34,48,48], according to heatmap,center,reg,offset

leftbackn3 commented 2 years ago

yes its working now and im receiving the x,y coordinates. This issue can be closed.

Can you please suggest me some links or papers through which I can read about the preprocessing of the images before training?

fire717 commented 2 years ago

All information I got to do reproduce is listed in README Resource part, like official blog and model card, and I also wrote a article for this, but it is in Chinese.

leftbackn3 commented 2 years ago

okay got it