Overview
Despite disabling denoising and using a low CRF, women's faces are usually unnaturally blurred while men's faces usually retain more detail.
Branch
In which branch does the issue appear to be occurring?
SVT-AV1-PSY v2.3.0
Reproduction
Steps to reproduce the behavior:
Encode a live-action video that depicts both men and women.
Zoom in on their faces; notice where the encoder added too much smoothing and therefore created an unrealistic, blurred look.
Expected behavior
Fine details on women’s faces — such as pores, vellus hairs, minor flaws in makeup, and small fine lines — should remain visible.
Additional context / Relevant Files
I presume that this is because women are more likely to be wearing full-coverage foundation and concealer, which confuses the encoder by making it perceive those parts of a frame as monolithic; on the other hand, it seems that men’s facial hair, lighter/absent makeup, and generally more pronounced pores/lines/wrinkles make it easier for the encoder to detect that it is not safe to smooth away details in those parts of a frame.
I think the only way to solve this would be implementing facial recognition which instructs the encoder to retain more details wherever faces are detected. It seems unlikely that this could be solved with another heuristic.
Here are two examples:
(I’ve seen it frequently, but I only have these two on hand at the moment.)
Overview Despite disabling denoising and using a low CRF, women's faces are usually unnaturally blurred while men's faces usually retain more detail.
Branch In which branch does the issue appear to be occurring? SVT-AV1-PSY v2.3.0
Reproduction Steps to reproduce the behavior:
Expected behavior Fine details on women’s faces — such as pores, vellus hairs, minor flaws in makeup, and small fine lines — should remain visible.
Additional context / Relevant Files I presume that this is because women are more likely to be wearing full-coverage foundation and concealer, which confuses the encoder by making it perceive those parts of a frame as monolithic; on the other hand, it seems that men’s facial hair, lighter/absent makeup, and generally more pronounced pores/lines/wrinkles make it easier for the encoder to detect that it is not safe to smooth away details in those parts of a frame.
I think the only way to solve this would be implementing facial recognition which instructs the encoder to retain more details wherever faces are detected. It seems unlikely that this could be solved with another heuristic.
Here are two examples:
(I’ve seen it frequently, but I only have these two on hand at the moment.)