floe / backscrub

Virtual Video Device for Background Replacement with Deep Semantic Segmentation
Apache License 2.0
734 stars 85 forks source link

Other background replacement code bases - a round up #58

Open phlash opened 3 years ago

phlash commented 3 years ago

Having recently discovered that the open source Jitsi video conferencing solution offers ML driven background replacement, I thought it would be interesting to round up who else is doing this here on Github and what tech is used..

MartinKlevs commented 3 years ago

Volcomix virtual background seems to give great results even on my bad laptop camera while also at 60fps. Would it be possible to do something similar here?

phlash commented 3 years ago

@MartinKlevs glad to hear it! On the surface, both Volcomix and Deepbacksub operate in a similar fashion, using the same Google models to detect a person, but there are a few differences, in particular, the use of async rendering in the Volcomix solution (which will arrive here with https://github.com/floe/deepbacksub/pull/59) and use of the browser's 2d or WebGL canvas in place of OpenCV for other image processing, which will use a GPU where available and likely reduce CPU loading a little, quite possibly making up for the use of WASM and Emscripten compiled C++ tensorflow :)

MartinKlevs commented 3 years ago

Volcomix seems to do a much better job overall. It also segments the whole image instead of a cropped area.

floe commented 3 years ago

Found this just now: https://developers.google.com/ml-kit/vision/selfie-segmentation Seems to be Apache-licensed (for now), problem is actually getting the model file, which is buried inside the MLKit runtime.

floe commented 3 years ago

Update: you can get the AAR file (which is just a Zip) via https://mvnrepository.com/artifact/com.google.mlkit/segmentation-selfie/16.0.0-beta1 - and there is indeed a .tflite file in there, that should be worth a try (it also has a quadratic input shape, so it should fit a landscape camera image somewhat better than the portrait-shape Meet model).

MartinKlevs commented 3 years ago

Nice! I was able to get it working with minimal adustments. The new model outputs a [0, 1] float32 mask. It seems to do a good job.

floe commented 3 years ago

Yes, quick-and-dirty implementation in https://github.com/floe/deepbacksub/commit/24dc33fcf1bc562754ce79e0bc61e8343ffbd47b - seems to be a candidate for new default model?

MartinKlevs commented 3 years ago

I agree. Personally i experience better results with the threshold set to 0.75.

insad commented 3 years ago

Some of my bookmarked links:

https://github.com/murari023/awesome-background-subtraction https://github.com/SwatiModi/virtual-background-app https://github.com/fangfufu/Linux-Fake-Background-Webcam https://github.com/PapaEcureuil/pyfakebg

BenBE commented 3 years ago

Yes, quick-and-dirty implementation in 24dc33f - seems to be a candidate for new default model?

Can you provide example screencaps?

floe commented 3 years ago

Three really quick-and-dirty (again) screenshots, in the order: new selfie model, Meet model, deeplabv3+.

Screenshot from 2021-04-14 08-31-53 Screenshot from 2021-04-14 08-32-06 Screenshot from 2021-04-14 08-32-28

MartinKlevs commented 3 years ago

This repo contains several models:

https://github.com/anilsathyan7/Portrait-Segmentation

phlash commented 3 years ago

@insad thanks for those - the list of papers is excellent! I came across the @SwatiModi (Android targeted using MediaPipe) and @fangfufu (Python+Node.js, derived from Ben Elder's original work) projects in my search, I hadn't found @PapaEcureuil's where Streamlit is being used to put a nice GUI on the Python+Tensorflow (full fat) engine.

insad commented 3 years ago

Did you see this one: https://github.com/ZHKKKe/MODNet ?

A lot of active development going on seems, sadly a lot of communication in chinese only...

insad commented 3 years ago

https://github.com/PeterL1n/BackgroundMattingV2

https://github.com/YexingWan/Fast-Portrait-Segmentation

https://github.com/mrgloom/awesome-semantic-segmentation

https://github.com/wpf535236337/real-time-network

https://github.com/josephch405/jit-masker

https://github.com/clovaai/ext_portrait_segmentation

BenBE commented 3 years ago

The new selfie model sure seems promising. What I think is still missing is the overlapping of masks to fill the whole image area from multiple NN runs.

progandy commented 3 years ago

Did you see this one: https://github.com/ZHKKKe/MODNet ?

It is used in this plugin for OBS: https://github.com/royshil/obs-backgroundremoval

phlash commented 3 years ago

@progandy - thanks, I note from Roy's README.md and a quick look at the code that his filter uses the Microsoft ONNX C++ wrapper for multiple possible ML frameworks (https://github.com/microsoft/onnxruntime), then borrows the ONNX pretrained ML model from https://github.com/ZHKKKe/MODNet (actually their google drive), but not their Python :wink:

elkhalafy commented 3 years ago

@phlash Please build backscrub as free plugin for obs studio

phlash commented 3 years ago

@elkhalafy I have an experimental OBS plugin that uses backscrub here: https://github.com/phlash/obs-backscrub

This builds against the experimental branch of backscrub where the core functionality is separated into a library and deepseg is a wrapper around it (as is the obs plugin).

elkhalafy commented 3 years ago

I hope complete the project and create it as Actual plugin we need it so much. @phlash

ghost commented 3 years ago

@floe the new model looks great. I think there's a place for larger models as well like the one from https://github.com/PeterL1n/BackgroundMattingV2 although I'm not sure what the status of GPU accleration is in backscrub since I haven't personally used XNNPack.

phlash commented 3 years ago

@dsingal0 No GPU acceleration in backscrub as yet, XNNPACK provides CPU optimised kernels for TFLite. That said GPU works[citation needed] via the TFlite GPU delegate and OpenCL in my hacked up branch here: https://github.com/phlash/backscrub/tree/xnnpack-test according to one tester :smile:

I would be interested to try the larger models from Peter Lin's paper, it looks like the ONNX ones are where we should start, which then need converting to TFLite through TF (apparently): https://stackoverflow.com/questions/53182177/how-do-you-convert-a-onnx-to-tflite

ghost commented 3 years ago

@phlash if going for GPU acceleration TensorRT would be great for NVIDIA GPUs since they have ONNX->TensorRT converters. I tried out the current models in the repo and all except DeepLabv3 and MLKit Segmentation were quite unusable. https://github.com/ZHKKKe/MODNet looks very promising based on their colab. It's heavier than the tflite models, but much lighter than BackgroundMattingV2 so it can feasibly run on Intel non-U series CPUs or a dGPU

rdreyer-godaddy commented 2 years ago

Just wanted to mention that Zoom now has some kind of ML segmentation in their Linux client (Version 5.7.6 - 31792.0820), too, and it's quite performant. Curious if someone is up for reverse engineering it.