Closed timtensor closed 4 years ago
Hello! Thank you for your kind words!
Any source of 2D joints can be used "out of the box" as long as it has the following Joints : HIP,NECK,HEAD,RSHOULDER,RELBOW,RHAND,LSHOULDER,LELBOW,LHAND,RHIP,RKNEE,RFOOT,LHIP,LKNEE,LFOOT since they are the joints used to generate the NSDM matrices internally used by the neural network, as seen in the following illustrations.
Right now the use case was a real-time demo of a single person, but due to the fast-enough evaluation speed on the 2D to 3D part multiple persons could be handled using iterative runs for every skeleton detected ( framerate should be ok for 1-3 persons but gradually slower ).
The easiest way to do a conversion from an arbitrary 2D joint estimator I think is by using the CSV file format -> https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/dataset/sample.csv
If the output is dumped in a csv file using this format then output can be very quickly tested through MocapNET using :
./MocapNETJSON --from YourDataset.csv --visualize
The CSV file format is very easy to write and parse (especially from python), the only caveat and possible pitfall is that the csv file has normalized coordinates that are expected to have a 1.777* aspect ratio since the original cameras I am targeting are GoPro cameras configured for 1920x1080@120fps+. If you have a different video input resolution the normalization step will have to respect this aspect ratio. Of course the code that I use to preserve the aspect ratio regardless of input is included in the repository and can be used for reference https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/MocapNETLib/jsonMocapNETHelpers.cpp#L498 and using the normalizeWhileAlsoMatchingTrainingAspectRatio call https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/MocapNETLib/jsonMocapNETHelpers.cpp#L174
That being said, I will clone HRNET and try it out :) 2D accuracy is very important in two-stage 3D pose estimation! Multiple person 3D tracking would also be very cool!
Thank you for the reply. I am trying out different permutations as well. Not really a pro , learning by trying. I must mention , with the yolov3
implementation the accruacy has greatly improved. One thing to note is , that is computationally quite heavy the 2D detection part of it .
Last version of yolo I had checked out was yolov2, but only for detection of objects and not persons. In any case testing with hrnet would be initially more like an offline experiment especially since hrnet is python/pytorch while this repo is C++/tensorflow
Yes i totally agree , as a start it should be done in locally saved videos .If i understand correctly, i might be wrong , you need the keypoints
per frame as an input to the MocapNET
module right .
Yes, you need at least the hip, neck, head, rshoulder, relbow, rhand, lshoulder, lelbow, lhand, rhip, rknee, rfoot, lhip, lknee, lfoot joint 2D positions organized as 2DXhip,2DYhip,Vhip , ... where V is a visibility flag that is 1 when the joint is visible and 0 when joint is invisible.
The sample CSV file shows the full joint list received from OpenPose Body+Hands 2D Output
The full list of input has 171 elements ( 57 triplets of X2D,Y2D,VisibilityFlag )
by populating an std::vector
This can be also visualized from the main application of course http://ammar.gr/mocapnet/mocapnetogl.ogv
Thanks a lot for the infromation. I still couldnt manage to extend it to the simple hrnet as 2D detector. I am doing everything off line at this moment
Hello, if you have a sample small CSV file you generated ( like this ) I can take a look at it to maybe help you resolve the problem..
I have given a CSV file example, that can be used to package any 2D estimator output and enable its processing by MocapNET, adding native support for multiple 2D estimators is beyond the scope of this repository so I am closing to this issue! :) In case of questions on how to package 2D input feel for MocapNET free to open a new issue!
Hi , first of all great work. I was wondering if it could be extended to
HrNET
as it is supposed to highly accurate ? Here is an implementation of it . I think it is possible , to dump thejson
file per frame format in for the keypoints. It is based oncoco
keypoints . Link to the repo simpleHRNETThere is a demo script here demo_script
The keypoints are outputted here keypoints The keypoint is array type of
Nx17x3
where N is number of persons. Please let me know what you think about it ?