Use the "momentum" - reuse the information from the previous few images

OndraZizka commented 3 years ago

From the results it seems that Deface detects faces in each frame individually.

This causes the blurred regions to jump wildly, sometimes a false positive just blinks in 1 frame, and what's worst: Sometimes the face slips into 1 or 2 frames, if obscured by even something small. Then the whole point of blurring is gone.

Deface could work in two passes:

Detect the faces in individual images, saving score, size and position.
Distribute the values between images, using a) simple gausian distribution of a "detected face's existence (score + position)" to the surrounding images, b) even better - matching the "same face" by clustering their positions in surrounding images, computing a vector of its movement, and assume a face in the computed position where e.g. nothing is detected, but the surrounding images have a high score.

This would have benefits:

1) The false negatives could be dramatically reduced. 2) In combination with the fixed number of faces parameter, the false positives could be reduced. 3) The computed vectors could go beyond the edges of the video, so a face which moves out or into the video, could be blurred when partly cropped. This solves another issue: If the face moves e.g. left to right, then the halves at the opposite video edges can be combined and whoever wants can have the whole face.

Sounds good?

mdraw commented 3 years ago

This definitely sounds good :) I had actually planned including such a smoothing feature (2.a) from the start but dropped it because I couldn't get it to work properly and had other priorities when implementing the project as a whole. Now that most other things work, this is IMO the most important TODO again. #11 could partially solve this, but I'm not yet sure how effective it is and how it would perform against 2.a and 2.b. Contributions in this domain are very welcome.

OndraZizka commented 3 years ago

I might contribute but not sure if in Python. I could use some middle layer data and process them with a Kotlin utility, giving back more middle layer data. That can later be translated to Python if needed. That I could do after the summer holiday, unless you come up with something better in the meantime (which is likely :) )

ORB-HD / deface

Use the "momentum" - reuse the information from the previous few images #8