New landscape selfie segmentation model

Volcomix / virtual-background

Demo on adding virtual background to a live video stream in the browser

https://volcomix.github.io/virtual-background

Apache License 2.0

487 stars 124 forks source link

New landscape selfie segmentation model #21

Closed benbro closed 2 years ago

benbro commented 3 years ago

Google released a new landscape model of the selfie segmentation: https://google.github.io/mediapipe/solutions/selfie_segmentation https://github.com/google/mediapipe/tree/master/mediapipe/modules/selfie_segmentation

How does this compares to the mlkit model in this project? Does selfie_segmentation.js use WebAssembly+SIMD or WebGL2? Should we use @mediapipe/selfie_segmentation/selfie_segmentation.js or is the model in this project more performant?

jpodwys commented 3 years ago

Thank you for the excellent repo! Your post-processing pipeline is spectacular!

I've also been interested in using the MediaPipe sefie segmentation model. I forked this project and tried the following

Force this repo to use the mediapipe landscape model
Update SegmentationConfig.inputResolution as well as InputResolution--I tried both 144x256 (as noted in the docs) and 256x144 just to be sure

In all cases, the output video feed is blurry. Screen Shot 2021-06-28 at 11 40 47 AM

When I saw that the output image is completely blurred, I wanted to check that the generated segmentation mask contained non-zero values. After checking tflite.HEAPF32 just before it gets passed to gl.texSubImage2D in loadSegmentationStage::render, I was able to confirm that the provided segmentation mask does include non-zero values.

With this in mind, I'm uncertain what else needs to be done get this new model working correctly within this repo. Any input is appreciated.

Thank you!

benbro commented 3 years ago

Maybe we shouldn't resize the input when using this model? The selfie_segmentation model docs says:

Segmentation automatically resizes the input image to the desired tensor dimension before feeding it into the ML models.

Can you please share your code?

jpodwys commented 3 years ago

Unfortunately, using the camera's native resolution as inputResolution (in my case 640x480) is not working.

Here's the code (note it's in the media-pipe branch of my fork). My fork strips out react and all non-background blur code.

The files you'll be interested in are

TFLite.ts: Imports the MediaPipe landscape model
blur.ts: Sets inputResolution

jpodwys commented 3 years ago

I also tried replacing part of the pipeline with part of Media Pipe's sample segmentation app just so I could get the segmentation mask directly from their code. I then fed the generated mask into the pipeline in this repo, but I didn't have any luck there either.

Volcomix commented 3 years ago

Hi @jpodwys, thank you for experimenting with this model. Just an intuition without investigating the model file, have you tried replacing this line by calling buildLoadSegmentationStage instead of buildSoftmaxStage? I'm wondering if the softmax is part of the model already.

jpodwys commented 3 years ago

Thank you for the suggestion, you were right!

So the following works

Swap to MediaPipe's model (I'm using the landscape model)
Change inputResolution to 256x144 (not 144x256 as noted in the docs)
Replace webgl2Pipeline's call to buildSoftmaxStage with buildLoadSegmentationStage

Screen Shot 2021-06-29 at 10 39 10 AM

The face of a happy dev :)

vavilovm commented 3 years ago

Hi @jpodwys, did you check how does the landscape model performance differ from ML Kit?

jpodwys commented 3 years ago

My fork uses the landscape MediaPipe model so you can compare this repo's live demo to my fork's live demo.

I also made a live demo that skips the post-processing pipeline and blurs directly via canvas. It has better performance (fps) but less impressive post-processing (there's a halo effect around humans). That said, the canvas-only blur implementation looks surprisingly good in this demo because, as Volcomix pointed out, the MediaPipe team built softmax directly into their model.