A possibility to track multi-person in the same scene

The weird behavior you are refering to arises from the 2D joint heatmap detection https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/src/JointEstimator2D/jointEstimator2D.cpp#L288 where the code tries to "retrieve" the joints with the strongest heatmap signatures..

If you get multiple persons in a scene the algorithm will try to "connect" parts of the bodies of different persons ( the parts with the highest score ) resulting in incorrect results..

In the older version of MocapNET ( MNET1 ) there used to be a mode ( https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/mnet1/src/MocapNET1/MocapNETLiveWebcamDemo/mocapNETLiveDemo.cpp#L838 ) where by giving the ./MocapNETLiveWebcamDemo --rectangle X Y WIDTH HEIGHT you could actually erase a part of the image so this part of the image will get ignored, this however was a silly workaround and in the next version it got removed..

//Some datasets have persons that appear in parts of the image, we might want to cover them using a rectangle //We do this before adding any borders or otherwise change of the ROI of the image, however we do this //after possible frame skips for the obviously increased performance.. if (coveringRectangle) { cv::Point pt1(coveringRectangleX,coveringRectangleY); cv::Point pt2(coveringRectangleX+coveringRectangleWidth,coveringRectangleY+coveringRectangleHeight); cv::rectangle(frame,pt1,pt2,cv::Scalar(0,0,0),-1,8,0); }

If you think you will find this useful then I could reinstate it..

That being said the second thing one can do is use OpenPose with the -number_people_max 1 flag, this way OpenPose will just pick one skeleton and solve the issue. OpenPose uses PAFs that allow joints to be connected on the same person, and has provisions to correctly seperate persons in a scene https://github.com/FORTH-ModelBasedTracker/MocapNET/blob/master/scripts/processDatasetWithOpenpose.sh#L23

A proper solution for the live webcam demo would be to incorporate a neural network like Darknet/YOLO ( https://github.com/AlexeyAB/darknet ) run this first on the incoming OpenCV frame, retrieve the persons on the image ( as seen here https://www.youtube.com/watch?v=saDipJR14Lc#t=23m ) and then run the MocapNET pipeline on each of the retrieved rectangles ..

This will work, it will also degrade framerate linearly with more persons present in the scene ( since the Neural Network will have to be executed once for each one of them ), then you will also have the additional problem of person reidentification so that you have multiple BVH file outputs and keep track of which skeleton belongs to which BVH file and update them correctly ..

So that being said adding all this complexity on the project is overkill and it doesnt have a lot of novelty or research interest so that is why it has been skipped..!

I think at this point the best thing to be done is masking parts of the scene you dont want in an attempt to workaround, ( or just use OpenPose as the 2D engine )

Hope I did a good job explaining the issue, Looking forward to your input

Ammar

FORTH-ModelBasedTracker / MocapNET

A possibility to track multi-person in the same scene #54