Faces at the boundary, negative co-ordinates

rtyson-veridiumid commented 4 months ago

https://github.com/BSI-OFIQ/OFIQ-Project/blob/ae44e41d6796e29d3071d9e4f3321fec72f8abf6/OFIQlib/modules/detectors/src/opencv_ssd_face_detector.cpp#L137

Hi, It is a bit unclear as to the intent of the filtering of the face detection boxes. Currently the output from SSDFaceDetector::UpdateFaces() could include negative values for left or top and similarly left+width and top+height could be outside the original images bounds.

This is basically because the image has been padded. If the face detector returns a bounding box location that includes part of the padded region it will be outside the original image bounds (eg when l = 0).

ISO/IEC 29794-5 states that when "The face box protrudes outside the image boundary" it should be deleted. However, it is unclear if image refers to the padded image or the original image. I could be used for the original image, and Ip for the padded to make this clear.

Is the intent to pick up faces that lay partially outside the image I bounds? If so, this has implications elsewhere as accurate values for face unicity can't be computed (eye or chin could be cropped off and not visible). But we would assume we still want to count those faces as they could still impact correct detection.

Maybe out of bounds co-ordinates are dealt with elsewhere?

I would also note, it appears that the image is being resized square (to 300x300) without regard to the aspect ratio of the input image. Doing this distorts faces in the input and lowers performance of the detector. Suggest to pad the image square first to maintain the correct aspect ratio, then add any further padding as needed for detection.

(What is the rationale for padding in this case?)

Thanks Richard

JoMe2704 commented 4 months ago

Yes, the intent of padding is to improve detection of faces that lay partially outside the image. In some use cases, the face of interest may by incompletely covered by the image. We see such cases, for instance, frequently in eGates, when passengers don't realize where the camera is. In my experiments, the padding by 20% improved detection accuracy of such cropped faces.

I assume, you refer to DIS 1 of ISO/IEC 29794-5? There, the face detection algorithm in Clause 6.4 is not (yet) correct, because the padded pixels need to be subtracted from the face boxes. We will suggest the necessary corrections in the next revision. Our code does it correctly, though. We will also suggest to include a note in Clause 6.4 explaining that the coordinates may lie outside the image region.

If the chin or the eyes is not visible in the image, computation of face unicity (using the T metric) might become inaccurate due to extrapolated landmark coordinates (these typically lie at the image boundary). However, neglecting incompletely covered faces in the ace unicity would be even more inaccurate as it might result in a perfect quality component value 100 even though several faces are visible.

Regarding the distortions by resizing the image to square size, I did some experiments with a variant that applies padding to bring the image to the square format before resizing. However, I didn't observe any improvement of accuracy by this additional step except in extreme cases (very small or very large aspect ratio) which are not realistic.

Johannes

rtyson-veridiumid commented 4 months ago

Thanks for the quick response.

OK, this makes sense. We should treat faces that are partially outside the bounds almost like they are occluded.

We are are trying to implement these measures ourselves, and using OFIQ as a benchmark. Faces partially outside the image bounds have a lot of disparity between our code and OFIQ, hence this rabbit hole, but I now think this is mainly down to the models hallucinating the position of cropped face features differently.

[FYI, we have found that mirror padding is quite useful for helping face detectors pick up such faces).

One last question, can we treat pixels outside the image, for the purpose of computing other measures, as just black? I think this is what happens in OFIQ?

JoMe2704 commented 4 months ago

In ISO/IEC 29794-5 (DIS) and in OFIQ, we have these quality components "Leftward crop of face in image", "Rightward crop of face in image", "Upward crop of face in image", "Downward crop of face in image", which penalize such cases. If you want to give actionable feedback, it is better not to treat cropped faces as occlusions. But this may not be relevant in your use cases.

To be fair, I should also mention that we had better results in FATE Quality SIDD (for Total Number of Faces), before we implemented the padding (submission secunet_002). It seems that, for the sequestered data of NIST, padding doesn't help. The reason could be that, due to the padding, the size of the faces in the resized image (300x300) becomes smaller. However, with OFIW we don't (primarily) target applications where we need to detect very small faces.

In the face detection algorithm, we apply padding with black colour before resizing to 300x300. However, I think, this was not your question as you ask for "other measures". The algorithms using face boxes are Landmark Estimation and Pose Estimation. There, the image is padded as necessary to be able to crop to the extended face box. We always use black colour for padding. This detail is indeed missing in the algorithm descriptions in the DIS of ISO/IEC 29794-5. Thanks for pointing that out.

BSI-OFIQ / OFIQ-Project

Faces at the boundary, negative co-ordinates #24