google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://ai.google.dev/edge/mediapipe
Apache License 2.0
27.78k stars 5.18k forks source link

@mediapipe/tasks-vision background replacement quality #4630

Open vlysytsia opened 1 year ago

vlysytsia commented 1 year ago

Hello, I rewrote @mediapipe/selfie_segmentation background replacement to @mediapipe/tasks-vision, and the quality of the background replacement is noticeably worse. I used the code from the official documentation as an example and changed the part responsible for segmentation - https://codepen.io/volodymyrl/pen/mdQKMdR?editors=0010. Is something wrong with my code, or is this how the model works(then why is it worse than deprecated @mediapipe/selfie_segmentation)?

kuaashish commented 1 year ago

@volodymyrl,

Could you please share any reference example compare to new tasks vision regarding the quality that can be more helpful bring up to internal notice. Thank you

vlysytsia commented 1 year ago

Hey @kuaashish , thanks for your answer. This is an example of background replacement https://codepen.io/Guimauve01/pen/wvEaVrN with @mediapipe/selfie_segmentation

khanhlvg commented 1 year ago

You should use confidenceMask instead of categoryMask to make the edge smoother. Check out this example: https://codepen.io/khanhlvg/full/WNYaqNW

satoren commented 1 year ago

Also, I had to write a GLSLShader to prevent GPU -> CPU transfers to get performance on par with selfie_segmentation.

kuaashish commented 1 year ago

Hello @volodymyrl,

Could you please go though the above comment. Thank you

vlysytsia commented 1 year ago

@kuaashish @khanhlvg, thanks for the example. I used it to prepare another example that combines both versions https://codepen.io/volodymyrl/pen/VwVVjxd?editors=1111. @mediapipe/selfie_segmentation (on the left) and @mediapipe/tasks-vision (on the right), you can see that @mediapipe/tasks-vision is still worse(the image has torn edges).

@satoren, sorry, I am not familiar with GLSL Shaders. Can you please explain what I need to do to improve media pipe performance?

khanhlvg commented 1 year ago

The difference you're seeing mostly caused by the visualization logic. I tweaked the edge smoothening logic a bit to improve reduce the visibility of the edge. If you set minConfidence = 0 and maxConfidence = 1, you'll get the same result as the legacy selfie segmentation SDK. image

Besides, in the legacy SDK you're using the square input model (modelSelection: 0). If your input is landscape image, you should switch that to 1.

satoren commented 1 year ago

@volodymyrl

Can you please explain what I need to do to improve media pipe performance?

I'm sorry can't post the code, but I hope it gives you a hint.

For best performance, I needed to get the mask as a mask image, like the selfie segmentation. See MPMask comments.

Here is the approach I took.

  1. Check the type that MPMask holds.

    • If hasUint8Array
      • new ImageData and copy to data from mask.getAsUint8Array()
      • Converted to ImageBitmap using createImageBitmap for use in CanvasRenderingContext2D drawImage
    • If hasFloat32Array
      • Almost the same as hasUint8Array, but the difference is that getAsFloat32Array is used.
    • If hasWebGLTexture
      • Use getAsWebGLTexture to get the texture and render it to Canvas using WebGL. The part converting to ImageBitmap with MPImage, a class similar to MPMask, may be helpful.
      • Converted to Canvas using createImageBitmap for use in CanvasRenderingContext2D drawImage.
  2. Use result ImageBitmap like segmentationMask in selfieSegmentation

Since the above steps are a bit cumbersome, I think one option is to continue using selfie segmentation until an easy-to-use environment is available.

This may be best discussed on #4491.

github-actions[bot] commented 1 year ago

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

vlysytsia commented 1 year ago

@satoren @khanhlvg, thanks for your answers! I updated the replace background function, and it looks acceptable, but still worse than @selfie_segmentation, not to mention how the background replacement works, for example, in Zoom. @satoren I can't use @selfie_segmentation because it has errors when you try to stop it, and the only suggestion to fix it was to upgrade to task-version (https://github.com/google/mediapipe/issues/3373).

kuaashish commented 1 year ago

@khanhlvg,

Could you please look into this issue? Thank you

satoren commented 1 year ago

@volodymyrl https://codepen.io/satoren/pen/rNQXRqp How about this? This is running on the CPU and not optimized for performance as here.

vlysytsia commented 1 year ago

Hey @satoren, thanks for your answer.

https://codepen.io/satoren/pen/rNQXRqp

In this example, the quality looks like the same as for @selfie_segmentation

How about this? This is running on the CPU and not optimized for performance as https://github.com/google/mediapipe/issues/4630#issuecomment-1657373951.

I use the second approach with getAsFloat32Array, but it works slow. To improve performance, I tried to use Web worker, but there are some limitations to the data you can post from the Web worker https://github.com/google/mediapipe/issues/4694.

satoren commented 1 year ago

You can transfer by converting to ImageBitmap

satoren commented 1 year ago

Also, using a webworker won't make it any faster. The bottleneck for this is the transfer from GPU to CPU. Try to find an efficient way to convert from WebGLTexture to ImageBitmap.

tyrmullen commented 1 year ago

+1 to everything satoren@ said :). In particular, generally try to keep things on GPU for best performance*. This is especially true with segmentation, since that can return an image and therefore can be run 100% on GPU, allowing for nice pipelining. So here's a quick summary for best performance for segmentation:

vlysytsia commented 1 year ago

@satoren @tyrmullen thank you for the detailed explanation. If getAsWebGLTexture() works better than getAs*Array() it would be great to have an example of its usage.

torinmb commented 1 year ago

While it's straightforward to setup a shader to render a full-screen texture, the main problem is if you're receiving a WebGLTexture you need to use the same gl context that MediaPipe is using to generate the texture, but you can't interfere with any of the existing operations. I've played around with using result.confidenceMask.getGL() and attempted to cache the gl parameters with gl.getParameter(). After attempting to draw the WebGLTexture I re-apply the same parameters. I'm able to successfully draw a full-screen texture on its own, but when I try to use the same gl context I get errors of "gl.INVALID_OPERATION The specified command is not allowed for the current state." Without knowledge of which gl parameters to cache it's really tough to get a working example

satoren commented 1 year ago

@torinmb Perhaps a different Canvas was created for the second segment, so the gl context is also different. You can pass the canvas to the task creation options so that the gl context can be fixed.

I think everyone agrees that the migration from selfie_segmentation is difficult, so I think it would be a good idea to include a utility function in @mediapipe/tasks-vision to simplify the conversion to ImageBitmap.

vlysytsia commented 1 year ago

I think everyone agrees that the migration from selfie_segmentation is difficult, so I think it would be a good idea to include a utility function in @mediapipe/tasks-vision to simplify the conversion to ImageBitmap.

@satoren @tyrmullen are you planning to include conversion to ImageBitmap in the next release? Or can you provide some working code example with getAsWebGLTexture? Thanks!

satoren commented 1 year ago

@volodymyrl Unfortunately I am not a person in google. But I could write example. @torinmb I hope this will be helpful to you.

torinmb commented 1 year ago

Thanks @satoren this is so helpful!

khanhlvg commented 1 year ago

@satoren Your example is great! Would you be able to port your logic and update the background replacement sample here? I think a lot of developers will benefit from it. https://github.com/googlesamples/mediapipe/tree/main/tutorials/background_segmenter

If you can, please send a pull request. Thanks!

danrossi commented 1 year ago

How to mix video and background with the mask using webgl using the update ? This is what I am doing with selfie segmentation previously. ?

It's still doing cpu draw of video in that code. Can I use the returned texture in a seperate context for processing with video ?

My shader mix looked like this before with smoothing function as there is no smoothing postprocessing.

outColor = mix(bgTex, vec4(frameColor, 1.0), maskTex.r);

Rendering video with the mask in the same context almost works but can't get video to render yet either mixed or alone.

 gl.activeTexture(gl.TEXTURE1)
    gl.bindTexture(gl.TEXTURE_2D, texture)
    gl.uniform1i(maskTextureLocation, 1)

    gl.activeTexture(gl.TEXTURE0)
    gl.bindTexture(gl.TEXTURE_2D, videoTexture)
    gl.uniform1i(frameTextureLocation, 0)

    gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGB, gl.RGB, gl.UNSIGNED_BYTE, video);
```
  float a = maskTex.r;
  gl_FragColor = vec4(frameTex.rgb, a);
```

The deeplab model seems very inaccurate blocky edges and not applying smoothing. 
satoren commented 1 year ago

@danrossi CanvasRenderingContext2D is executed on the GPU, so drawImage is also executed on the GPU. It is more efficient to blend once like your method, but not as important

Your question is a general question about sharing webgl textures on another canvas. I suggest you do a search on webgl.

danrossi commented 1 year ago

I have this modification of yours. It's mixing all elements in webgl on that same offscreen context but unlike the segmentation api it's not blending yet the mask is cutting through the background, it's a similar shader to what I was doing before. My example is using low cpu and gpu.

So canvas drawImage of video is not cpu ?

There is this message about closing resources for memory leaks

"You seem to be creating MPMask instances without invoking .close(). This leaks resources."

https://codepen.io/danrossi/pen/yLGLmdv

Using the selfie segment model instead of deep labs the mix is working as before and using even lower resources. Not sure what the difference with the mask is and how to apply it different. It doesn't do softmax smoothing post processing like the deeplabs model has calculators compiled in. I have a shader method for that.

https://codepen.io/danrossi/pen/MWZYgKB

satoren commented 1 year ago

@danrossi

"You seem to be creating MPMask instances without invoking .close(). This leaks resources."

Oh, Thank you. We needed to explicitly close if I passed the canvas. My example is fixed

My example is using low cpu and gpu.

In my environment, the CPU utilization between my sample and yours was only about the same as the margin of error, but in what environment did you measure it?

My environment: Windows 11 Ryzen 7 4800U with Radeon graphics

My example

スクリーンショット 2023-08-20 20 12 17

Your example

スクリーンショット 2023-08-20 20 12 23
danrossi commented 1 year ago

Windows 11 and RTX 2060. Just a Ryzen 5 3600. I'd rather mix elements in webgl than in the canvas as I'm already doing. But I had made a ticket there is a model calculator to mix the video, mask and background directly in wasm with smoothing ! It's not compiled in for the models. It would save all this work after.

But using the deeplabs isn't mixing the mask correctly it's cutting through to the background in the first example. Where the selfie segment model is masking the video correctly in the second example.

With deeplabs I'm not sure if smoothing is needed, it logs softmax is active, the selfie model does with blocky edges. But the deeplab model is showing parts in the video not of the body tracked, selfie segment doesn't I've noticed. It detects objects in the background like fabrics.

Update: I just noticed the canvas render output is square for the texture input using the offscreen canvas . I may have to grab the mask as a bitmap to a secondary webgl render sadly. So the viewport is correct.

danrossi commented 1 year ago

I figured out what is going on with the deeplab model. The produced mask, the red channel is outside the mask. It's different to the selfie segment, the mask is red. Hence not displaying properly.

Correction if the source is 720p the resulting image from the offscreen render is 720p as long as the viewport is changed to the video dimension.

Inverting the r channel seems to work. But it shows this model is less accurate on the edges compared to the selfie model.

However the resulting mask from deeplab has too much edge compared to the selfie model which has little edge but needs softmax smoothing in the shader.

https://codepen.io/danrossi/pen/yLGLmdv

Screenshot 2023-08-20 221637 Screenshot 2023-08-20 221547

tyrmullen commented 1 year ago

@satoren Nice example; thanks for writing and sharing this! Two quick things I noticed, in case they can be helpful:

@danrossi For sharing textures between WebGL contexts, usually the best way is to use an ImageBitmap (you can use glTexImage2D to convert ImageBitmap back into a WebGLTexture). So @satoren's example above can be helpful for your use case as well.

satoren commented 1 year ago

@tyrmullen

The callback-based version is more efficient (saves an extra copy), and doesn't require you to call .close(). But the trade-off is that the WebGL texture will only be available during the callback.

I see, so there is such a difference, thank you.

If possible, I'd recommend switching to transferToImageBitmap instead of createImageBitmap, since the latter is async and hence can sometimes introduce some small delays. Especially if you're already rendering to the canvas's main (null) framebuffer anyways (usually that's the most annoying part).

I will still wait to use transferToImageBitmap because Public release Safari does not yet support webgl in OffscreenCanvas.

danrossi commented 1 year ago
  • transferToImageBitmap

My updated example using the offline canvas context is working then get the mixed background,video and mask image. I guess for browsers not Safari use transferToImageBitmap then Safari use createImageBitmap. Then draw it in the canvas without needing another webgl context to draw it. I can experiment rendering the returned bitmap as a texture in webgl to see if it performs better.

However if trying to use the deeplab model the mask is inverted. So the alpha needs to be inverted in the shader. And it's edges are not accurate, it may need extra shader work to trim the mask edges somehow. I'll try some things.

I converted things to transferToImageBitmap and it reduces cpu even further but the returned bitmap has no dimensions for me. It's a simpler draw. Not sure if the safari check is still needed to fallback to the other method.

deeplab - https://codepen.io/danrossi/pen/yLGLmdv selfie segment model - https://codepen.io/danrossi/pen/MWZYgKB

vlysytsia commented 11 months ago

I think everyone agrees that the migration from selfie_segmentation is difficult, so I think it would be a good idea to include a utility function in @mediapipe/tasks-vision to simplify the conversion to ImageBitmap.

@khanhlvg @tyrmullen, are there any updates about including an ImageBitmap result to export? I haven't found any updates in the code example or documentation.

coolcorexix commented 4 months ago

@VolodymyrL Unfortunately I am not a person in google. But I could write example. @torinmb I hope this will be helpful to you.

Huge thank for making this, it is almost perfect but there is a small problem if I want to implement BLURRING BACKGROUND feature with this example: the mask created from the shader is just too thin and it is not enough to blur the object or hide the background

The solution is to do change the fs to this:

const fs = `
precision highp float;
varying vec2 texCoords;
uniform sampler2D textureSampler;

void main() {
    float a = texture2D(textureSampler, texCoords).r;

    // Apply step function to thicken the texture
    a = step(0.3, a);

    gl_FragColor = vec4(a, a, a, a);
}
`