cioppaanthony / rt-sbs

This repository contains the code for the paper: "Real-Time Semantic Background Subtraction", published at the ICIP 2020 conference.
Other
24 stars 0 forks source link

Question about noise in code #5

Closed BogdanovKirill closed 2 years ago

BogdanovKirill commented 3 years ago

Hello!

It seems that noise is directly connected with matching threshold in original ViBe algorithm. So, in original C code you could see matchingThreshold set to 20 and it corellates with noise threshold. In your version we can see much lower value in matchingThreshold (10).

https://github.com/cioppaanthony/rt-sbs/blob/0d4f96c6d4bf034d181dad58ab4d8b77f1078163/algorithms/ViBeGPU.py#L66 Does it mean that these values should be also corrected (-10, +10) ? Lower matchingThreshold also means larger amount of noise in result mask. Does it mean that high level algorithm compenstates it?

Thank you in advance!

cioppaanthony commented 3 years ago

Hi @BogdanovKirill!

Thanks for your interest in our work!

I asked my supervisor (Professor Marc Van Droogenbroeck), who developped ViBe, to give me his intuition on that. Here is his answer:

There are different modes to initialize the ViBe model. The one described in the paper Barnich and Van Droogenbroeck, published in IEEE ITIP 2011, proceeds by filling the model with values found in the neighborhood. An alternative, which is the one you describe, consists to add some noise to the current value. However, as you mentioned it, the noise level should be below the value of the threshold in order to avoid the appearance of false negatives (pixels of the foreground wrongly considered as being in the background) if it was applied to all background values, which is not the case here since we keep at least two frames intacts.

Please also note that there are two phases with ViBe: (1) the initialization and (2) the normal regime. It is not mandatory for the parameter set to be the same for both regimes. It all depends on the time (number of frames) before you switch from (1) to (2). During phase (1), as explained in the paper, the most critical effect is the presence of ghosts. If the number of frames for phase (1) is very small, there is a need to speed up the suppression of ghosts. For longer initialization sequences, there is commonly no difference between how you fill the background model.

The value of the matching threshold relates to how close the pixel values have to be compared with the background model, so in a sense how strict we are on the decision for the background. So indeed, a lower matching threshold might increase the number of false positives (and so noise in the prediction mask). Since the semantic model is indeed quite good at predicting background pixels, it may remove a lot of background noise due to a lower matching threshold. These particular values of the parameters were obtained using an hyper-parameter optimization procedure. This is different from what was done for ViBe back in 2011, which was optimized on two sequences only (simply because there were no annotated videos at that time).

I hope this helps!

BogdanovKirill commented 2 years ago

I see, thank you very much for the explanation