The MLKit selfie segmentation model should be finetuned for office chairs

floe / backscrub

Virtual Video Device for Background Replacement with Deep Semantic Segmentation

Apache License 2.0

734 stars 85 forks source link

The MLKit selfie segmentation model should be finetuned for office chairs #76

Closed mikaelhg closed 1 year ago

mikaelhg commented 3 years ago

The MLKit selfie segmentation model models/selfiesegmentation_mlkit-256x256-2021_01_19-v1215.f16.tflite is pretty good, but it could still be finetuned for office chairs.

The model card describes the dataset used to train the model as "1700 images, 100 images from each of 17 the geographical subregions ..."

phlash commented 3 years ago

Tagging as an enhancement and help-wanted as I don't know if anyone currently involved can do this!

BenBE commented 3 years ago

Another idea for "automatic" tracking suggested by @martok the other day was recording some video of your background and automatically tagging everything above some noise threshold as foreground in order to gain training material. That way you could create the necessary amount of tagged images with as little effort as possible. And you even get the network to be quite sensitive only to one person, which could have the advantage that other people in the background could be filtered by such a net.

mikaelhg commented 3 years ago

Another idea I've been toying with has been approaching the problem from a genetic algorithm perspective, where you have a crawler that starts from the bottom left side of the screen, and moves across the bottom until it figures out where the human figure is, and then just algorithmically looks at the probably border pixels to draw an outline.

Pandinosaurus commented 3 years ago

The model card describes the dataset used to train the model as "1700 images, 100 images from each of 17 the geographical subregions ...

Actually, Google's model cards do not detail training data, but only some validation data. Their goal is to make us aware of potential bias regarding a population without disclosing the training data (AI should not discriminate).

floe commented 3 years ago

@BenBE @phlash side note: should we use the selfiesegmentation model as default for the next release?