Open manojs8473 opened 1 year ago
Hello! Thank you for your kind words!
And sorry about the delay responding, I am currently writing my PhD thesis while for the last months I have been abroad for almost two months for project meetings, conferences + a secondment in Italy so I was not logged in Github and did not see the issues. I received the 2FA warning and logged in after some time and show the issue today! :(
A lot of excellent questions First of all for object 3D pose you will first need to train an RGB -> 2D heatmap estimator that produces 2D "joint" data for the objects of your choice.
For a tennis racket for example 5 points, the handle the top of the racket, the sides and its center For a Baseball bat 3 points the handle, the top of the bat and its middle. etc. Although there now exist foundation models such as SAM, mask RCNNs etc that would automatically segment the racket, baseball etc you will still need some landmarks to incorporate them in the 3D pose solution.
You can easily extend the BVH file to accommodate extra geometry :
If you look at https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/dataset/headerWithHeadAndOneMotion.bvh
and take a look at
http://www.dcs.shef.ac.uk/intranet/research/public/resmes/CS0111.pdf
I think you can easily extend the BVH armature with such a shape.
In terms of the MocapNET model you will need to include the new "joints" of the racket/baseball to the NSRM matrices The description on how to make the descriptor is here : http://users.ics.forth.gr/~argyros/mypapers/2021_11_BMVC_Qammaz.pdf . The architecture could remain the same in my opinion it should scale to one more joint with no problems
MocapNET is typically trained on 3M pose samples. Having a BVH source like the one I use https://drive.google.com/file/d/1Zt-MycqhMylfBUqgmW9sLBclNNxoNGqV/view?usp=drive_link You will need to write a program that goes into each BVH file and applies the extra joints for your "Tool" be that a racket, hammer, baseball etc.. You will then have a dataset with enough samples
Unfortunately FORTH which is the license holder for this work, prevents me from sharing the training code for the network, however I think with the Python code shared here : https://github.com/FORTH-ModelBasedTracker/MocapNET/tree/mnet4/src/python/mnet4
Hello!
First of all, thank you for delivering this incredible work! I'm interested in customizing the current model to estimate the 3D pose of objects like a baseball bat or tennis racket in the hands of the actor in addition to the 3D pose of the human body, which the model already does successfully. I have a few questions and doubts regarding this task:
Customizing Skeleton Hierarchy: Is it possible to customize the current skeleton hierarchy and add new bones or edges to represent the bat or racket? I assume this would be necessary to include these objects in the pose estimation.
Architectural Changes: What sort of changes will be required in the architecture of the model to accommodate the estimation of 3D pose for objects? Are there any specific layers or components that need to be modified or added?
Training Data Volume: Could you provide insights into the volume of data that the model would require for training to achieve good accuracy in estimating the 3D pose of both the human body and objects like baseball bats and tennis rackets?
Your comments and suggestions on how to approach this customization would be immensely appreciated. Thank you!