Open scottxp opened 1 year ago
@scottxp,
Could you please elaborate your query with complete details and if you can share any captured jitter image to understand the issue better?
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.
I am encountering a similar issue in Python. It seems that there is no landmarks smoothing in the mediapipe v0.10.1 In the following two videos we can see the effect. The video that uses mediapipe 0.10.1 the landmarks jitter a lot more than in the second video, which uses mediapipe 0.8.11.
The videos are basically a single image that is fed to the mediapipe solution and task in a loop. The same effect also happens when using webcam.
https://github.com/google/mediapipe/assets/16905449/02381e1a-0514-41a9-81e0-20f6a6eeeced
https://github.com/google/mediapipe/assets/16905449/2310a792-dfd0-450d-a9ec-689ba28f7682
My assumption is that the difference lies in the graph that is used in pose estimation. In version 0.8.11 it uses the following graph that has a smoothing calculator: https://github.com/google/mediapipe/blob/release/mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt#L216
While in 0.10.1 it builds a different graph that basically contains PoseLandmarkerGraph calculator and a FlowLimiterCalculator. The PoseLandmarkerGraph calculator consists of two sub graphs, that I assume don't have any smoothing calculator in them. https://github.com/google/mediapipe/blob/91a3c54d558af8c4a0807d2bdd47e875a3c1e87a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_graph.cc#L219
Maybe it would be possible to enhance the graph that is created in mediapipe 0.10.1, and add the smoothing calculator. I will try doing that, but I'm not sure if the input and output streams will be compatible.
https://github.com/google/mediapipe/assets/879510/cd51f61e-5f96-48a5-9685-f6e04bdcf435
LEFT: mediapipe/tasks-vision@0.10.1 RIGHT: mediapipe/pose@0.5.1675469404/pose.js
As described by @igorbasko01, there does not appear to be any landmark smoothing in mediapipe 0.10.1. You can see the jittering landmarks in the video on the left while the video on the right does not jitter. These were captured and processed simultaneously using the same webcam stream but different mediapipe libraries.
Here is the code for the video on the LEFT:
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
);
window.poseLandmarker = await PoseLandmarker.createFromOptions(
vision,
{
baseOptions: {
modelAssetPath: "/models/pose_landmarker_full.task",
delegate: "GPU",
},
runningMode: "VIDEO"
}
);
And here is the code for the video on the RIGHT:
window.poseDetector = await poseDetection.createDetector("BlazePose", {
runtime: "tfjs",
enableSmoothing: true,
modelType: "full",
solutionPath = 'https://cdn.jsdelivr.net/npm/@mediapipe/pose'
});
I have attempted to implement a Python version of the OneEuroFilter
, closely modeled after the C++ version found at this mediapipe implementation: https://github.com/google/mediapipe/blob/bed624f3b6f7ad5d25b5474c516561c537f10199/mediapipe/util/filtering/one_euro_filter.cc#L14
I've also replicated the same parameters for this Python OneEuroFilter
, including setting the frequency to 30, which corresponds to the number of frames per second (FPS). I used the parameters that can be seen here: https://github.com/google/mediapipe/blob/c8c5f3d062f441eb37738c789a3550e7280ebefe/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt#L115
During the callback, I apply this filter to the Normalized Landmarks of the PoseLandmarkerResult
. Notably, I created a distinct filter for each axis of each landmark.
Unfortunately, the jittering issue seems to persist, and I'm unable to observe any significant improvements.
For a closer look, you can find my Python filter implementation and usage in this gist: https://gist.github.com/igorbasko01/c51980df0ce9a516c8bcc4ff8e039eb7
I would greatly appreciate any help in addressing this issue or any advice on potential workarounds.
Pose landmark smoothing is not implemented yet according to the C++ Source Code: https://github.com/google/mediapipe/blob/df3f4167aed857c891395b4bab851a8a4f8024f8/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_graph.cc#L312
That being said, there is a new MultiWorldLandmarksSmoothingCalculator in the code, it's just not used anywhere yet: https://github.com/google/mediapipe/blob/df3f4167aed857c891395b4bab851a8a4f8024f8/mediapipe/calculators/util/multi_world_landmarks_smoothing_calculator.h#L57
Assuming it's functional, it might be possible to plug in the landmark outputs from the pose landmarker graph into it to get the smoothed landmarks, at least in C++.
// Edit: After some more testing I can confirm that using the calculator for smoothing works with the "one euro" filter!
Hey @Silverlan
Thanks for the suggestion !
Can you please elaborate a bit more on what exactly did you do, and how did you add the MultiWorldLandmarksSmoothingCalculator
.
Hey @Silverlan Thanks for the suggestion ! Can you please elaborate a bit more on what exactly did you do, and how did you add the
MultiWorldLandmarksSmoothingCalculator
.
I use the C++ API. I can describe my steps, but I don't know the approach for the other APIs.
1) Added //mediapipe/calculators/util:multi_world_landmarks_smoothing_calculator
as dependency to my project so I can use the MultiWorldLandmarksSmoothingCalculator
calculator.
2) Added the MultiWorldLandmarksSmoothingCalculator
calculator to my graph with the one_euro
filter (velocity filter did not work for me):
{
auto& smoothCalculator = graph.AddNode(
"MultiWorldLandmarksSmoothingCalculator");
auto* options = &smoothCalculator.GetOptions<mediapipe::LandmarksSmoothingCalculatorOptions>();
auto* filter = options->mutable_one_euro_filter();
filter->set_beta(smoothingFilterSettings.beta);
filter->set_disable_value_scaling(smoothingFilterSettings.disableValueScaling);
filter->set_frequency(smoothingFilterSettings.frequency);
filter->set_min_cutoff(smoothingFilterSettings.minCutoff);
filter->set_derivate_cutoff(smoothingFilterSettings.derivateCutoff);
filter->set_min_allowed_object_scale(smoothingFilterSettings.minAllowedObjectScale);
worldLandmarks >>
smoothCalculator.In("LANDMARKS");
trackingIdsInput >>
smoothCalculator.In("TRACKING_IDS");
smoothCalculator.Out("FILTERED_LANDMARKS").SetName(outputName) >>
graph[::mediapipe::api2::Output< std::vector<mediapipe::LandmarkList>>(graphOutputName)];
}
Make sure to add this node after the PoseLandmarkerGraph
(or HandLandmarkerGraph
) node. Then use the WORLD_LANDMARKS
output of the PoseLandmarkerGraph
for the LANDMARKS
input of the MultiWorldLandmarksSmoothingCalculator
node.
3) For the TRACKING_IDS
input you have to create a std::vector<int64_t>
with the exact same size as the number of poses. I just have one pose, so I just initialized it with std::vector<int64_t> trackingIds {0}
, then you can use that as input:
std::vector<int64_t> trackingIds = { 0 };
auto packetTrackingIds = mediapipe::MakePacket<std::vector<int64_t>>(trackingIds);
4) The one_euro
filter properties are critical, with the default settings I didn't notice any reduction in jitter at all. The values below worked for me:
smoothingFilterSettings.beta = 10.0
smoothingFilterSettings.minCutoff = 0.05
smoothingFilterSettings.derivateCutoff = 1
smoothingFilterSettings.disableValueScaling = false
smoothingFilterSettings.frequency = 30.0
smoothingFilterSettings.minAllowedObjectScale = 1e-06
You'll probably have to tweak them and play around with them though.
5) It won't work without this step: You have to set a timestamp for all input packets:
auto msTime = cap.get(cv::CAP_PROP_POS_MSEC); // Time in miliseconds
auto mcTime = msTime *1000.f; // Time in microseconds
auto packetImg = mediapipe::MakePacket<mediapipe::Image>(*image);
packetImg = packetImg.At(mediapipe::Timestamp(mcTime));
auto packetArea = mediapipe::MakePacket<mediapipe::NormalizedRect>(MakeNormRect(0.5, 0.5, 1.0, 1.0, 0));
packetArea = packetArea.At(mediapipe::Timestamp(mcTime));
std::vector<int64_t> trackingIds = { 0 };
auto packetTrackingIds = mediapipe::MakePacket<std::vector<int64_t>>(trackingIds);
packetTrackingIds = packetTrackingIds.At(mediapipe::Timestamp(mcTime));
auto outputPackets = taskRunner.Process(
{ {"image", packetImg},
{"norm_rect",packetArea},
{"tracking_ids",packetTrackingIds}
});
6) The FILTERED_LANDMARKS
output of the MultiWorldLandmarksSmoothingCalculator
node is your smoothed world landmarks.
Hope that helps!
Thanks a lot @Silverlan I will try and use your example and see if I can also use it in Python.
@scottxp,
Could you please confirm that this is still an issue or it has been resolved from your end. Thank you!
@igor-basko I would be very interested in your python fix for this issue.
@kuaashish this is still very much an issue for a python implementation.
@kuaashish This is still an issue for me using the javascript library.
still an issue for me (android&python)
this is still an issue, confirmed on the javascript library
still an issue Ubuntu Python
Has anyone made any progress on a python solution that removed the jitter?
@scottxp,
We are pleased to announce the release of the latest version of MediaPipe, version 0.10.7, which addresses the jittering issue observed in the Pose Landmarker.
This issue has been documented in the release notes under "Fixed Pose Landmarker jittering issue." We kindly request you to build using this updated version and inform us of any persisting issues from your perspective. Thank you
@kuaashish this is not doing anything different in JavaScript with version 0.10.7:
await PoseLandmarker.createFromOptions(vision, {
baseOptions: {
modelAssetPath: 'https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/latest/pose_landmarker_lite.task',
delegate: "GPU"
},
runningMode: "VIDEO",
smoothLandmarks: true,
numPoses: 1
});
I can confirmed the jittering still exists, maybe the models on https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/latest/pose_landmarker_*.task
haven't been updated yet?
In python mediapipe 0.10.7 it seems to work when i use runningMode VIDEO and use the detector.detect_for_video function but in JavaScript with runningMode VIDEO and poseLandmarker.detectForVideo is still jittering
checked affirmative, jitter in JavaScript still persist, regardless in runningMode VIDEO or LIVE_STREAM
I confirm the problem is still there
Until the problem is fixed I am using smoothing on my side, proposed by chatgpt: https://gist.github.com/mupakoz/c7b3183914b52a08eebbc61599af7e1b
@npinochet @yiucheung0512 @WiCanIsCool @scottxp maybe it helps you
The jittering issue has long been a problem and I hope they can fix this so we can use it in motion capture. I just ran across a possible solution for the javascript version, but I have not tried it yet. https://github.com/yousufkalim/mediapipe-pose-smooth
Just found a video I did of a program I wrote using mediapipe and iclone 2 years ago, same jitter with the hands https://www.youtube.com/watch?v=j6JboJIlpfM
It's not the landmarker models - it is the single shot detector of the pipeline. @igor-basko, @Silverlan you can prove this by feeding looped image video frames directly through the face_mesh landmarker models. The BlazeFace detectors return detection box non-reliably - they will return deviating boxes each frame on loopt image. The issue must be fixed on model level. BlazeFace detector is "blazingly" fast, taking 2ms on edge devices, and it returns several landmarks as well. The issue must be fixed at that or landmarker level - no amount of messing with kalman filters will help post-recognition.
This seems solved with version 0.10.9
I have the same problem on android. I am working with NextJS 14 and this is my current version: "@mediapipe/tasks-vision": "^0.10.9", This is how I create the PoseLandmarker:
export const loadPoseLandmarkerModel = async (): Promise<Uint8Array> => {
const response = await fetch(`/static/models/pose_landmarker_lite.task`);
if (!response.ok) {
throw new Error(`Failed to load pose landmarker model file: ${response.statusText}`);
}
const buffer = await response.arrayBuffer();
return new Uint8Array(buffer);
};
export const createPoseLandmarker = async (runningMode: "VIDEO" | "IMAGE"): Promise<PoseLandmarker | null> => {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.0/wasm"
);
const model = await loadPoseLandmarkerModel();
return PoseLandmarker.createFromOptions(vision, {
baseOptions: {
modelAssetBuffer: model,
delegate: "GPU",
},
runningMode: runningMode,
numPoses: 1,
});
};
Here is the result on Android:
https://github.com/google/mediapipe/assets/91951421/552b4f84-0893-4af2-91c0-e5e5d5d60eda
Am I doing something wrong?
Hello, try 0.10.9 like below
https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.9/wasm
Also check out MediaPipe version number on your package.json
The following code works for me.
import { FilesetResolver, PoseLandmarker } from "@mediapipe/tasks-vision";
export const createPoseLandmarker = async (runningMode: "VIDEO" | "IMAGE"): Promise<PoseLandmarker | null> => {
const vision = await FilesetResolver.forVisionTasks(
"https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.9/wasm"
);
const poseLandmarker = await PoseLandmarker.createFromModelPath(
vision,
"https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task"
);
await poseLandmarker.setOptions({
runningMode: runningMode,
numPoses: 1,
});
return poseLandmarker;
};
delegate: "GPU"
On the last point I did some tests and the result was that when the delegate prop has "GPU" the tracking is not optimal and is jittering a lot and when I remove the prop is working fine.
I also receive this exception on initialization:
In the end is working fine, a bit slower than on my iOS device but it's getting the job done! Thanks a lot for the help!
Grad it helped. GPU inference works fine on my side but I'm on macOS chrome. Maybe try clearing cache. I also see a mysterious error around delegate option and it appears and disappears depends on the day.
I tested it with mediapipe==0.10.9
on Python (Windows). It jitters a lot less than version 0.10.1
.
But I think that it still jitters a bit more than in the 0.8.11
version. You can see the comparison in the previous comment:
https://github.com/google/mediapipe/issues/4507#issuecomment-1600617343
https://github.com/google/mediapipe/assets/16905449/d87fb441-b60e-4bd2-ac09-da5473b4c74d
@igorbasko01, try a movie The jitter with Python versions still exists both from the landmarker and detector. It seems they updated the detector so the larger jitter amplitudes are removed on static image, however there's still some on moving image from both and still jitter on static image that you see above that shouldn't be in a streaming mode. Stavility/relability is very important for any useful/production solution and improve the experience of the end user exponentially in mos domains. Maybe someone can test the models used in graph standalone and see if jitter can be addressed separately for them for streaming application if mediapipe team is not interested in bringing those into open source solutions?
I recorded the difference between using "@mediapipe/pose": "^0.5.1675469404"
and "@mediapipe/tasks-vision": "^0.10.12"
You can really see it when the video ends and it's processing the still image. I'd like to use the latest, I could figure out how to apply my own filter but if it's not going to work I will wait.
Respective settings are:
// old version
poseSolution = new Pose({
locateFile: (file: string) => {
return `https://cdn.jsdelivr.net/npm/@mediapipe/pose/${file}`;
},
});
poseSolution.setOptions({
modelComplexity: 1,
smoothLandmarks: true,
});
//new version
poseLandmarker = await PoseLandmarker.createFromOptions(vision, {
baseOptions: {
modelAssetPath: `https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_full/float16/latest/pose_landmarker_full.task`,
delegate: "GPU",
},
runningMode: "VIDEO",
numPoses: 1
});
https://github.com/google/mediapipe/assets/17362459/7431fcff-7f3b-46bd-a969-427e4f05e15f
https://github.com/google/mediapipe/assets/17362459/468e9a04-c7ca-49ae-be9a-034620a78198
Have I written custom code (as opposed to using a stock example script provided in MediaPipe)
No
OS Platform and Distribution
Mac OS X 13.0.1
MediaPipe Tasks SDK version
0.10.0
Task name (e.g. Image classification, Gesture recognition etc.)
Pose Landmark Detection
Programming Language and version (e.g. C++, Python, Java)
Javascript
Describe the actual behavior
Describe the expected behaviour
Standalone code/steps you may have used to try to get what you need
Other info / Complete Logs