Image size degrades emotion classification accuracy when holding face pixel size constant

markallenthornton commented 2 years ago

Using v0.4.0 or the current m1_testing branch, using the default detectors specified in the documentation, I'm encountering an issue where using large images seems to degrade performance, when holding the pixel size of the actual faces constant. Simply cropping out face-free parts of the image improves performance considerably. I suspect that this might be happening because the image is downsampled for face detection, and then when the faces are extracted using the resulting bounding boxes, the downsampled rather than original image is used. This would result in the faces being unnecessarily downsampled in large images that are mostly free of faces, leading to degraded performance. If this is the problem, I would suggest upsampling the bounding boxes back to the original image resolution, and then extracting the faces from the original. They could always be downsampled from this point if necessary for the emotion model, but at least it wouldn't be based on something arbitrary like the overall image size.

ejolly commented 1 year ago

Thanks for the suggestion!

To partially address this issue (among others), since 0.5.0 we support passing kwargs to the underlying pre-trained detectors during initializing and prediction. But will continue pursuing more robust solutions.

ejolly commented 1 year ago

Related: #73

cosanlab / py-feat

Image size degrades emotion classification accuracy when holding face pixel size constant #135