google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://mediapipe.dev
Apache License 2.0
26.15k stars 5.04k forks source link

Pose Landmarker Jittering #4507

Open scottxp opened 1 year ago

scottxp commented 1 year ago

Have I written custom code (as opposed to using a stock example script provided in MediaPipe)

No

OS Platform and Distribution

Mac OS X 13.0.1

MediaPipe Tasks SDK version

0.10.0

Task name (e.g. Image classification, Gesture recognition etc.)

Pose Landmark Detection

Programming Language and version (e.g. C++, Python, Java)

Javascript

Describe the actual behavior

I have switched over from the legacy mediapipe library to the new mediapipe solutions. The landmarks are jittering more than expected when I use the VIDEO runningMode on GPU or CPU with any of the pose_landmarker tasks.

Describe the expected behaviour

The legacy mediapipe pose estimation detection offered a smoothing parameter (smoothLandmarks) to reduce the jittering, which worked quite well. I have not been able to find this option in the new mediapipe solutions library.

Standalone code/steps you may have used to try to get what you need

The jittering can be observed on the official mediapipe solutions demo page:

https://mediapipe-studio.webapps.google.com/demo/pose_landmarker

Other info / Complete Logs

Here is my sample code to instantiate the PoseLandmarker:

const vision = await FilesetResolver.forVisionTasks(
    // path/to/wasm/root
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
  );
  window.poseLandmarker = await PoseLandmarker.createFromOptions(
    vision,
    {
      baseOptions: {
        modelAssetPath: `/models/pose_landmarker_${model_type}.task`,
        delegate: "GPU",
      },
      runningMode: "VIDEO"
    }
  );
kuaashish commented 1 year ago

@scottxp,

Could you please elaborate your query with complete details and if you can share any captured jitter image to understand the issue better?

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

igorbasko01 commented 1 year ago

I am encountering a similar issue in Python. It seems that there is no landmarks smoothing in the mediapipe v0.10.1 In the following two videos we can see the effect. The video that uses mediapipe 0.10.1 the landmarks jitter a lot more than in the second video, which uses mediapipe 0.8.11.

The videos are basically a single image that is fed to the mediapipe solution and task in a loop. The same effect also happens when using webcam.

https://github.com/google/mediapipe/assets/16905449/02381e1a-0514-41a9-81e0-20f6a6eeeced

https://github.com/google/mediapipe/assets/16905449/2310a792-dfd0-450d-a9ec-689ba28f7682

My assumption is that the difference lies in the graph that is used in pose estimation. In version 0.8.11 it uses the following graph that has a smoothing calculator: https://github.com/google/mediapipe/blob/release/mediapipe/modules/pose_landmark/pose_landmark_cpu.pbtxt#L216

While in 0.10.1 it builds a different graph that basically contains PoseLandmarkerGraph calculator and a FlowLimiterCalculator. The PoseLandmarkerGraph calculator consists of two sub graphs, that I assume don't have any smoothing calculator in them. https://github.com/google/mediapipe/blob/91a3c54d558af8c4a0807d2bdd47e875a3c1e87a/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_graph.cc#L219

Maybe it would be possible to enhance the graph that is created in mediapipe 0.10.1, and add the smoothing calculator. I will try doing that, but I'm not sure if the input and output streams will be compatible.

scottxp commented 1 year ago

https://github.com/google/mediapipe/assets/879510/cd51f61e-5f96-48a5-9685-f6e04bdcf435

LEFT: mediapipe/tasks-vision@0.10.1 RIGHT: mediapipe/pose@0.5.1675469404/pose.js

As described by @igorbasko01, there does not appear to be any landmark smoothing in mediapipe 0.10.1. You can see the jittering landmarks in the video on the left while the video on the right does not jitter. These were captured and processed simultaneously using the same webcam stream but different mediapipe libraries.

Here is the code for the video on the LEFT:

const vision = await FilesetResolver.forVisionTasks(
  "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
);
window.poseLandmarker = await PoseLandmarker.createFromOptions(
  vision,
  {
    baseOptions: {
      modelAssetPath: "/models/pose_landmarker_full.task",
      delegate: "GPU",
    },
    runningMode: "VIDEO"
  }
);

And here is the code for the video on the RIGHT:

window.poseDetector = await poseDetection.createDetector("BlazePose", {
      runtime: "tfjs",
      enableSmoothing: true,
      modelType: "full",
      solutionPath = 'https://cdn.jsdelivr.net/npm/@mediapipe/pose'
});
igorbasko01 commented 1 year ago

I have attempted to implement a Python version of the OneEuroFilter, closely modeled after the C++ version found at this mediapipe implementation: https://github.com/google/mediapipe/blob/bed624f3b6f7ad5d25b5474c516561c537f10199/mediapipe/util/filtering/one_euro_filter.cc#L14

I've also replicated the same parameters for this Python OneEuroFilter, including setting the frequency to 30, which corresponds to the number of frames per second (FPS). I used the parameters that can be seen here: https://github.com/google/mediapipe/blob/c8c5f3d062f441eb37738c789a3550e7280ebefe/mediapipe/modules/pose_landmark/pose_landmark_filtering.pbtxt#L115

During the callback, I apply this filter to the Normalized Landmarks of the PoseLandmarkerResult. Notably, I created a distinct filter for each axis of each landmark.

Unfortunately, the jittering issue seems to persist, and I'm unable to observe any significant improvements.

For a closer look, you can find my Python filter implementation and usage in this gist: https://gist.github.com/igorbasko01/c51980df0ce9a516c8bcc4ff8e039eb7

I would greatly appreciate any help in addressing this issue or any advice on potential workarounds.

Silverlan commented 12 months ago

Pose landmark smoothing is not implemented yet according to the C++ Source Code: https://github.com/google/mediapipe/blob/df3f4167aed857c891395b4bab851a8a4f8024f8/mediapipe/tasks/cc/vision/pose_landmarker/pose_landmarker_graph.cc#L312

That being said, there is a new MultiWorldLandmarksSmoothingCalculator in the code, it's just not used anywhere yet: https://github.com/google/mediapipe/blob/df3f4167aed857c891395b4bab851a8a4f8024f8/mediapipe/calculators/util/multi_world_landmarks_smoothing_calculator.h#L57

Assuming it's functional, it might be possible to plug in the landmark outputs from the pose landmarker graph into it to get the smoothed landmarks, at least in C++.

// Edit: After some more testing I can confirm that using the calculator for smoothing works with the "one euro" filter!

igor-basko commented 12 months ago

Hey @Silverlan Thanks for the suggestion ! Can you please elaborate a bit more on what exactly did you do, and how did you add the MultiWorldLandmarksSmoothingCalculator.

Silverlan commented 12 months ago

Hey @Silverlan Thanks for the suggestion ! Can you please elaborate a bit more on what exactly did you do, and how did you add the MultiWorldLandmarksSmoothingCalculator.

I use the C++ API. I can describe my steps, but I don't know the approach for the other APIs.

1) Added //mediapipe/calculators/util:multi_world_landmarks_smoothing_calculator as dependency to my project so I can use the MultiWorldLandmarksSmoothingCalculator calculator. 2) Added the MultiWorldLandmarksSmoothingCalculator calculator to my graph with the one_euro filter (velocity filter did not work for me):

{
auto& smoothCalculator = graph.AddNode(
"MultiWorldLandmarksSmoothingCalculator");
auto* options = &smoothCalculator.GetOptions<mediapipe::LandmarksSmoothingCalculatorOptions>();

auto* filter = options->mutable_one_euro_filter();
filter->set_beta(smoothingFilterSettings.beta);
filter->set_disable_value_scaling(smoothingFilterSettings.disableValueScaling);
filter->set_frequency(smoothingFilterSettings.frequency);
filter->set_min_cutoff(smoothingFilterSettings.minCutoff);
filter->set_derivate_cutoff(smoothingFilterSettings.derivateCutoff);
filter->set_min_allowed_object_scale(smoothingFilterSettings.minAllowedObjectScale);

worldLandmarks >>
smoothCalculator.In("LANDMARKS");
trackingIdsInput >>
smoothCalculator.In("TRACKING_IDS");
smoothCalculator.Out("FILTERED_LANDMARKS").SetName(outputName) >>
graph[::mediapipe::api2::Output< std::vector<mediapipe::LandmarkList>>(graphOutputName)];
}

Make sure to add this node after the PoseLandmarkerGraph (or HandLandmarkerGraph) node. Then use the WORLD_LANDMARKS output of the PoseLandmarkerGraph for the LANDMARKS input of the MultiWorldLandmarksSmoothingCalculator node.

3) For the TRACKING_IDS input you have to create a std::vector<int64_t> with the exact same size as the number of poses. I just have one pose, so I just initialized it with std::vector<int64_t> trackingIds {0}, then you can use that as input:

std::vector<int64_t> trackingIds = { 0 };
auto packetTrackingIds = mediapipe::MakePacket<std::vector<int64_t>>(trackingIds);

4) The one_euro filter properties are critical, with the default settings I didn't notice any reduction in jitter at all. The values below worked for me:

smoothingFilterSettings.beta = 10.0
smoothingFilterSettings.minCutoff = 0.05
smoothingFilterSettings.derivateCutoff = 1
smoothingFilterSettings.disableValueScaling = false
smoothingFilterSettings.frequency = 30.0
smoothingFilterSettings.minAllowedObjectScale = 1e-06

You'll probably have to tweak them and play around with them though.

5) It won't work without this step: You have to set a timestamp for all input packets:

auto msTime = cap.get(cv::CAP_PROP_POS_MSEC); // Time in miliseconds
auto mcTime = msTime *1000.f; // Time in microseconds

auto packetImg = mediapipe::MakePacket<mediapipe::Image>(*image);
packetImg = packetImg.At(mediapipe::Timestamp(mcTime));

auto packetArea = mediapipe::MakePacket<mediapipe::NormalizedRect>(MakeNormRect(0.5, 0.5, 1.0, 1.0, 0));
packetArea = packetArea.At(mediapipe::Timestamp(mcTime));

std::vector<int64_t> trackingIds = { 0 };
auto packetTrackingIds = mediapipe::MakePacket<std::vector<int64_t>>(trackingIds);
packetTrackingIds = packetTrackingIds.At(mediapipe::Timestamp(mcTime));

auto outputPackets = taskRunner.Process(
{ {"image", packetImg},

{"norm_rect",packetArea},

{"tracking_ids",packetTrackingIds}
});

6) The FILTERED_LANDMARKS output of the MultiWorldLandmarksSmoothingCalculator node is your smoothed world landmarks.

Source Code: https://github.com/Silverlan/mediapipe_pragma_wrapper/blob/5d75a9cb7b6647522d33a8e2d8a30d82ad2b5dff/mediapipe/examples/desktop/mediapipe_pragma_wrapper/mediapipe_pragma_wrapper.cc#L645

Hope that helps!

igor-basko commented 12 months ago

Thanks a lot @Silverlan I will try and use your example and see if I can also use it in Python.

kuaashish commented 11 months ago

@scottxp,

Could you please confirm that this is still an issue or it has been resolved from your end. Thank you!

mcdonasd1212 commented 10 months ago

@igor-basko I would be very interested in your python fix for this issue.

@kuaashish this is still very much an issue for a python implementation.

scottxp commented 10 months ago

@kuaashish This is still an issue for me using the javascript library.

124bit commented 10 months ago

still an issue for me (android&python)

mihaiEDW commented 10 months ago

this is still an issue, confirmed on the javascript library

mcdonasd1212 commented 10 months ago

still an issue Ubuntu Python

mcdonasd1212 commented 9 months ago

Has anyone made any progress on a python solution that removed the jitter?

kuaashish commented 9 months ago

@scottxp,

We are pleased to announce the release of the latest version of MediaPipe, version 0.10.7, which addresses the jittering issue observed in the Pose Landmarker.

This issue has been documented in the release notes under "Fixed Pose Landmarker jittering issue." We kindly request you to build using this updated version and inform us of any persisting issues from your perspective. Thank you

WiCanIsCool commented 9 months ago

@kuaashish this is not doing anything different in JavaScript with version 0.10.7:

await PoseLandmarker.createFromOptions(vision, {
            baseOptions: {
                modelAssetPath: 'https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/latest/pose_landmarker_lite.task',
                delegate: "GPU"
            },
            runningMode: "VIDEO",
            smoothLandmarks: true,
            numPoses: 1
        });
npinochet commented 9 months ago

I can confirmed the jittering still exists, maybe the models on https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/latest/pose_landmarker_*.task haven't been updated yet?

WiCanIsCool commented 9 months ago

In python mediapipe 0.10.7 it seems to work when i use runningMode VIDEO and use the detector.detect_for_video function but in JavaScript with runningMode VIDEO and poseLandmarker.detectForVideo is still jittering

yiucheung0512 commented 8 months ago

checked affirmative, jitter in JavaScript still persist, regardless in runningMode VIDEO or LIVE_STREAM

mupakoz commented 7 months ago

I confirm the problem is still there

Until the problem is fixed I am using smoothing on my side, proposed by chatgpt: https://gist.github.com/mupakoz/c7b3183914b52a08eebbc61599af7e1b

@npinochet @yiucheung0512 @WiCanIsCool @scottxp maybe it helps you

delebash commented 7 months ago

The jittering issue has long been a problem and I hope they can fix this so we can use it in motion capture. I just ran across a possible solution for the javascript version, but I have not tried it yet. https://github.com/yousufkalim/mediapipe-pose-smooth

Just found a video I did of a program I wrote using mediapipe and iclone 2 years ago, same jitter with the hands https://www.youtube.com/watch?v=j6JboJIlpfM

bedbad commented 5 months ago

It's not the landmarker models - it is the single shot detector of the pipeline. @igor-basko, @Silverlan you can prove this by feeding looped image video frames directly through the face_mesh landmarker models. The BlazeFace detectors return detection box non-reliably - they will return deviating boxes each frame on loopt image. The issue must be fixed on model level. BlazeFace detector is "blazingly" fast, taking 2ms on edge devices, and it returns several landmarks as well. The issue must be fixed at that or landmarker level - no amount of messing with kalman filters will help post-recognition.

hiroMTB commented 5 months ago

This seems solved with version 0.10.9

cstamati commented 5 months ago

I have the same problem on android. I am working with NextJS 14 and this is my current version: "@mediapipe/tasks-vision": "^0.10.9", This is how I create the PoseLandmarker:

export const loadPoseLandmarkerModel = async (): Promise<Uint8Array> => {
    const response = await fetch(`/static/models/pose_landmarker_lite.task`);
    if (!response.ok) {
        throw new Error(`Failed to load pose landmarker model file: ${response.statusText}`);
    }
    const buffer = await response.arrayBuffer();
    return new Uint8Array(buffer);
};

export const createPoseLandmarker = async (runningMode: "VIDEO" | "IMAGE"): Promise<PoseLandmarker | null> => {
    const vision = await FilesetResolver.forVisionTasks(
        "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.0/wasm"
    );

    const model = await loadPoseLandmarkerModel();

    return PoseLandmarker.createFromOptions(vision, {
        baseOptions: {
            modelAssetBuffer: model,
            delegate: "GPU",
        },
        runningMode: runningMode,
        numPoses: 1,
    });
};

Here is the result on Android:

https://github.com/google/mediapipe/assets/91951421/552b4f84-0893-4af2-91c0-e5e5d5d60eda

Am I doing something wrong?

hiroMTB commented 5 months ago

Hello, try 0.10.9 like below

https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.9/wasm

hiroMTB commented 5 months ago

Also check out MediaPipe version number on your package.json

cstamati commented 5 months ago

The following code works for me.

import { FilesetResolver, PoseLandmarker } from "@mediapipe/tasks-vision";

export const createPoseLandmarker = async (runningMode: "VIDEO" | "IMAGE"): Promise<PoseLandmarker | null> => {
    const vision = await FilesetResolver.forVisionTasks(
        "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@0.10.9/wasm"
    );

    const poseLandmarker = await PoseLandmarker.createFromModelPath(
        vision,
        "https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_lite/float16/1/pose_landmarker_lite.task"
    );

    await poseLandmarker.setOptions({
        runningMode: runningMode,
        numPoses: 1,
    });

    return poseLandmarker;
};

On the last point I did some tests and the result was that when the delegate prop has "GPU" the tracking is not optimal and is jittering a lot and when I remove the prop is working fine.

I also receive this exception on initialization:

image

In the end is working fine, a bit slower than on my iOS device but it's getting the job done! Thanks a lot for the help!

hiroMTB commented 5 months ago

Grad it helped. GPU inference works fine on my side but I'm on macOS chrome. Maybe try clearing cache. I also see a mysterious error around delegate option and it appears and disappears depends on the day.

igorbasko01 commented 5 months ago

I tested it with mediapipe==0.10.9 on Python (Windows). It jitters a lot less than version 0.10.1.

But I think that it still jitters a bit more than in the 0.8.11 version. You can see the comparison in the previous comment: https://github.com/google/mediapipe/issues/4507#issuecomment-1600617343

https://github.com/google/mediapipe/assets/16905449/d87fb441-b60e-4bd2-ac09-da5473b4c74d

bedbad commented 5 months ago

@igorbasko01, try a movie The jitter with Python versions still exists both from the landmarker and detector. It seems they updated the detector so the larger jitter amplitudes are removed on static image, however there's still some on moving image from both and still jitter on static image that you see above that shouldn't be in a streaming mode. Stavility/relability is very important for any useful/production solution and improve the experience of the end user exponentially in mos domains. Maybe someone can test the models used in graph standalone and see if jitter can be addressed separately for them for streaming application if mediapipe team is not interested in bringing those into open source solutions?

atomassoni commented 3 months ago

I recorded the difference between using "@mediapipe/pose": "^0.5.1675469404" and "@mediapipe/tasks-vision": "^0.10.12" You can really see it when the video ends and it's processing the still image. I'd like to use the latest, I could figure out how to apply my own filter but if it's not going to work I will wait.

Respective settings are:

// old version
poseSolution = new Pose({
    locateFile: (file: string) => {
        return `https://cdn.jsdelivr.net/npm/@mediapipe/pose/${file}`;
      },

    });
    poseSolution.setOptions({
      modelComplexity: 1,
      smoothLandmarks: true,
    });

 //new version
poseLandmarker = await PoseLandmarker.createFromOptions(vision, {
        baseOptions: {
          modelAssetPath: `https://storage.googleapis.com/mediapipe-models/pose_landmarker/pose_landmarker_full/float16/latest/pose_landmarker_full.task`,
          delegate: "GPU",   
        },
        runningMode: "VIDEO",
        numPoses: 1
      });

https://github.com/google/mediapipe/assets/17362459/7431fcff-7f3b-46bd-a969-427e4f05e15f

https://github.com/google/mediapipe/assets/17362459/468e9a04-c7ca-49ae-be9a-034620a78198