cosyneco / MediaPipe.NET

Pure .NET bindings for Google's MediaPipe.
MIT License
95 stars 18 forks source link

NormalizedLandmarkList.Landmark is wrong on BlazePose #31

Closed mohelwazer closed 2 years ago

mohelwazer commented 2 years ago

I have downloaded your latest master branch to test your BlazePose example for one of the projects I'm conducting. I got to run the console app successfully. then I have saved the "Landmark" from 200 frames in a file. The data in the first 5-10 frames are correct the rest of the data is being copied again and again. I managed to save the returning ImageFrame objects from Calculator.Send(..) as Jpg using ImageSharp - 200 images were saved - the images are changing (correctly masked my body) while the skeleton is not changing after 5-10 images.

I ran the same test for multiple camera positions, different movements, body poses and even clothes. each single time the same exact scenario mentioned.

Speykious commented 2 years ago

You seem to pinpoint a MediaPipe solution problem. As the MediaPipe Pose docs say:

The solution utilizes a two-step detector-tracker ML pipeline, proven to be effective in our MediaPipe Hands and MediaPipe Face Mesh solutions. Using a detector, the pipeline first locates the person/pose region-of-interest (ROI) within the frame. The tracker subsequently predicts the pose landmarks and segmentation mask within the ROI using the ROI-cropped frame as input. Note that for video use cases the detector is invoked only as needed, i.e., for the very first frame and when the tracker could no longer identify body pose presence in the previous frame. For other frames the pipeline simply derives the ROI from the previous frame’s pose landmarks.

I think we have to forward an issue to MediaPipe. If I understood this explanation correctly, the problem you're facing is a direct result of their implementation, which is not really valid for our use-case.

Speykious commented 2 years ago

Actually, maybe we just have to look into the side packets. I'll look into it. https://google.github.io/mediapipe/solutions/pose#mediapipe-pose

mohelwazer commented 2 years ago

I am familiar with MediaPipe using Python. My issue thou is not a MediaPipe bug. I have used the same MediaPipe version that MediaPipe.Net is running natively within the wrapper but the results are different. I wanted also to mention that I am not talking about detection errors, I understand this is on MediaPipe models.

sr229 commented 2 years ago

This bug is not actionable unless you provide a test case. Please provide a reproducible test case in Python and then in .NET so we can verify what's wrong.

mohelwazer commented 2 years ago

I used CVZone V1.5.5 with MediaPipe 0.8.9.1 for pose detection with findPose(..) and findPosition(..) sending the image parameter ... OpenCV to load Images and videos.

In Python:

In .Net (for ease of testing)

I understand that the results from MediaPipe.Net is already normalized (maybe the problem is here I don't know yet). I'm comparing value change from the mean for each of the X, Y, Z elements for each joint. and I found that the change in python is way bigger than the change within .Net

sr229 commented 2 years ago

I used CVZone V1.5.5 with MediaPipe 0.8.9.1 for pose detection with findPose(..) and findPosition(..) sending the image parameter ... OpenCV to load Images and videos.

In Python:

  • load any video you have using OpenCV
  • while loop on video frames to get images send it findPosition(..) out of CVZone
  • append a file with the resulting landmarks

In .Net (for ease of testing)

  • Loaded the same video using FFmediaToolKit
  • looped over the video frames -> converting each frame into ImageFrame
  • Then sending it to the Calculator
  • I handled each LandmarkList using Calculator.OnResult Event

I understand that the results from MediaPipe.Net is already normalized (maybe the problem is here I don't know yet). I'm comparing value change from the mean for each of the X, Y, Z elements for each joint. and I found that the change in python is way bigger than the change within .Net

Thanks for the response, we'll check what's wrong, I don't think our SidePacket implementation is to blame here considering our test cases hasn't showed this deficiency.

Speykious commented 2 years ago

It appears that the Python bindings directly use the pose_landmark graph as opposed to the pose_tracking graph that we use on the C# side. It might very well be the core of the implementation difference. However, up until now we have never needed to specify side packets, so it might take a while to fix depending on what we have to modify.

sr229 commented 2 years ago

Going to put this on Backlog for now as we don't have the capacity to fix this properly until Q4.

mohelwazer commented 2 years ago

Thanks for your response. Looking forward. In the meanwhile if I found a solution I will post too.

ProchyMicrochip commented 2 years ago

I think that I replicated issue, but if I run my input in low enough resolution it looks like that, landmarks are only awfully slow to folow my movements

mohelwazer commented 2 years ago

I have came back to this issue and tested more. The initialization of PoseCpuSolution calculator was using modelComplexity of 2 new PoseCpuSolution(modelComplexity: 2, smoothLandmarks: false); which was extremally slow. I used the existing code of Mediapipe.Net.Exmaples.Pose project. I have changed the initialization of the calculator to use modelComplexity of 1 and everything worked well.

Thanks, I'm going to close this issue.