floe / backscrub

Virtual Video Device for Background Replacement with Deep Semantic Segmentation
Apache License 2.0
734 stars 85 forks source link

Idea: Pre-Filtering on camera noise #42

Open BenBE opened 3 years ago

BenBE commented 3 years ago

While discussing some things about ANN with @martok it has been noted, that when experimenting with such networks they seem to be sensitive to noise in their inputs. As camera's are physical sensors[citation needed] they create noise. This noise is especially noticeable in dark environments, e.g. at night. Filtering out this noise (doesn't need to be perfect) should help with getting better detection rates with the various ANN. Having this pre-processing step might even help to adapt the white balance of the input image to that of the ANN's training data (as noted in #29 et. al.).

martok commented 3 years ago

While discussing some things about ANN with @martok it has been noted, that when experimenting with such networks they seem to be sensitive to noise in their inputs.

To provide an example: That Commercial Product Which Is A Synonym For Magnify seems to have a lot of problems with the color noise that shows up when the camera switches to high ISO-equivalents. The exact same setup (person, background) segments fine in bright daylight, but is noticably splotchy in cozy evening living room lighting. I have no further insights what is going on, just a user observation...

floe commented 3 years ago

This is pretty easy to test, I just pushed 5660433 which uses the standard OpenCV bilateral filter before handing the image to the CNN. I don't see a lot of difference, but it's rather bright right now, so maybe I'll test again tonight :-)

floe commented 3 years ago

@BenBE OK to close this issue?

BenBE commented 3 years ago

Sorry for the late reply.

The segmentation result doesn't seem to differ much. The only nets I got any results in dark light conditions (only monitor as illumination source) were the deeplabs (unstable, partially blocked out the middle of the face) and selfie (full temporal dropouts) and body-pix (very blocky™ and only the face part itself, even there many things like hair and shoulder missing).

So overall the suggested patch in 5660433 seems to work, but it's not quite what I meant with this issue: I thought of something like using using multiple images of the video stream to estimate the noise and based on that noise statistic reduce the noise in the next frame of the stream. Also of interest might be https://en.wikipedia.org/wiki/Deep_Image_Prior (which I just stumbled upon).

ghost commented 3 years ago

Usually removing camera noise tends to not improve segmentation by much. If anything, you might want to remove noise at the end after the model run. That's because you likely want to the model to be representative of the training data, and most of these models do not get rid of camera noise before training. Furthermore, segmentation models don't look at images as we do, so often the noise can telegraph information that we might be removing if we get rid of the noise before the model sees it.