cosyneco / MediaPipe.NET

Pure .NET bindings for Google's MediaPipe.
MIT License
95 stars 18 forks source link

What is the most stable version of Media Pipe I can use to run hand tracking on windows? #65

Open TheWorldEndsWithUs opened 4 months ago

TheWorldEndsWithUs commented 4 months ago

Hi, I I have been jumping around a few of the Github issues trying to get hand tracking working on windows. I tried downgrading and the application immediately shuts down when running. Can you give me any tips on what version to install to get hand tracking working on windows?

sr229 commented 4 months ago

We've only tried face pose but anything on 0.8.x should just work.

TheWorldEndsWithUs commented 4 months ago

Okay, I've been bouncing through different issues all day. I will try to list all of my solutions here.

First download both the cpu runtime (GPU doesn't work for windows) and the mediapipe.net packages by changing the version in nuget. The Mediapipe.net package should be 0.8.10 The CPU runtime should be 0.8.9

Then you need to get the mediapipe folder from here (download the repo as a zip, extract the media pipe folder to the same folder as your .exe)

Include the mediapipe libraries and Emgu

using Mediapipe.Net.Calculators; using Mediapipe.Net.External; using Mediapipe.Net.Framework.Format; using Mediapipe.Net.Framework.Protobuf; using Emgu.CV;

Next, SeeShark didn't work for me. I ended up using opencv (emgu.cv) instead. I created a timer on windows forms and I set the interval to 1000 / FPS. (Create a variable called FPS and assign it 30 to start).

` int FPS = 30;

    VideoCapture capture;
    private void Form1_Load(object sender, EventArgs e)
    {
        timer1.Interval = 1000 / FPS;
        capture = new VideoCapture();

        calculator = new FaceMeshCpuCalculator();
        calculator.OnResult += handleLandmarks;
        calculator.Run();

    }`

Then you need to get the pictures displaying on the screen (use nuget to get emgu windows runtime and emgu.cv.bitmap package)

` private void timer1_Tick(object sender, EventArgs e) { Mat frame = capture.QueryFrame(); Bitmap frameAsBitmap = frame.ToBitmap(); pictureBox1.Image = frameAsBitmap;

        ReadOnlySpan<byte> byteSpan = new ReadOnlySpan<byte>(frame.GetRawData());

        ImageFrame imgframe = new ImageFrame(ImageFormat.Srgb,
               frame.Width, frame.Height, frame.Step, byteSpan);

        using ImageFrame img = calculator.Send(imgframe);
    }`

If you have a code above and the handleLandmarks method from the example and download the tflite models you need from this url and add them to the appropriate folders for whatever errors you are getting. Then face landmark detection will work.

Note: The face landmarks will not be drawn to the screen, you need to do that yourself, but they will be detected.

So far hands has been a lot more difficult to get up and running. The first part is easy, replace FaceMeshCpuCalculator with HandCpuCalculator in the code. You will also have to change your handleLandmarks function to not accept a list of NormalizedLandmarkList, but a single normalizedLandmarkList, then the real work begins. You must edit and rename graph files (found in mediapipe\graphs\hand_tracking (the folder you copied to the .exe directory earlier).

I don't remember everything I did, but I will try to list the things that got me to my current point (it's still not working). First you should know the pbtxt files are the graph files. They have code like files that have inputs and outputs that connect to eachother. The first error should say something like hand_tracking_desktop_live_cpu doesn't exist. It technically does, it just doesn't have the _cpu at the end of it. So find the hand_tracking_desktop_live file and rename it to hand_tracking_desktop_live_cpu.

Then you need to edit the cpu file with information I found from this comment. Edit the node under the # Detects/tracks hand landmarks comment. Change output_stream: "LANDMARKS:landmarks" to output_stream: "LANDMARKS:hand_landmarks". Do the same for the input stream in the same file below that node. input_stream: "LANDMARKS:landmarks" to input_stream: "LANDMARKS:hand_landmarks".

Then go into the hand_renderer_cpu file in the subgraphs folder. Change all occurrences of landmarks to hand_landmarks. Then it prompted me to download more model files from the link above. That is where I am currently. After downloading those files, It's currently crashing with no errors again, so I need to keep trying to mess with the graph and get it all working. I'll report back any findings.

sr229 commented 4 months ago

I'll be pinning this for other people, this is a good solution for those who want it to get running. I've been planning to rewrite this for a while, but for now this is a good workaround you documented.

TheWorldEndsWithUs commented 4 months ago

Thanks for the pin! Unfortunately, I wasn't able to get it working. When you change "hand_landmarks" to "landmarks" in all of the files and you have all of the models downloaded and in their correct places the application crashes immediately. I used the event viewer to get an error code linked to ucrtbase.dll. I put the code into chatgpt, and it seems to think it's a stack overflow error.

GeorgeS2019 commented 4 months ago

@TheWorldEndsWithUs @sr229 We now have C# wrappers for v0.10.09 Mediapipe in Godot for most examples

We are aiming upgrading to V0.10.11

TheWorldEndsWithUs commented 4 months ago

@GeorgeS2019 That's fantastic! My Google and Github searches lead me to the discussion you started about a year ago regarding the C# support, and this wrapper generator for the extension. However, I'm not too sure what I am looking at. Are you able to run the landmark detection without using Godot? Does it work within a standalone C# project?

GeorgeS2019 commented 4 months ago

Godot could run as a library...it is cutting edge area need a lot to check . We are now pushing towards the latest version of Mediapipe

GeorgeS2019 commented 4 months ago

C# wrapper code https://github.com/Delsin-Yu/CSharp-Wrapper-Generator-for-GDExtension/pull/26/files

sr229 commented 4 months ago

I think this is a little out of topic since the entire point of this project is to use MediaPipe regardless of what framework you're using, so unless you're willing to help bring this with vendor neutrality in mind this does not really mean anything.

GeorgeS2019 commented 4 months ago

@sr229 The challenge of working with Mediapipe in .Net is to build the dll Then, the .Net wrapper generation automatically. Both are not trivial

sr229 commented 4 months ago

Hence it takes us time to port from the MediaPipe Unity plugin we derive from - those take tremendous amounts of time to port due to the amount of changes, inasmuch as we would love it to be automated with ClangSharp.

To reiterate myself, please if you have an interest to use this and get this to work, please help us.

GeorgeS2019 commented 4 months ago

There is now GitHub CI windows build v0.10.13 using vs2019 PR submitted to Google.

Hopefully the support for windows is no longer experimental