Open pboudolf opened 4 years ago
@pboudolf Hi, I think, there is a slight misunderstanding. I know your feeling.
At first, even if we use the model learned by the authors, it seems that the recognition accuracy will not be significantly reduced by setting the image size to 128x128. In that case, you can adjust the scale value properly. The adjustment formula is discussed in https://github.com/HRNet/HRNet-Facial-Landmark-Detection/issues/3.
Also, the argument [64, 64] of decode_preds
is used as the reference value when calculating the coordinate.
It is treated only as a number on decode_preds
.
And, [64, 64] means the heatmap size of the model.
The argument of decode_preds
and the heatmap size of the model shoud be synchronized .
My recommendation is not to change the author's model config. And, referring to https://github.com/HRNet/HRNet-Facial-Landmark-Detection/issues/3, calculate appropriately according to your image. I think this is the simplest solution.
FYI!
Hi, I used the same learning conditions (crop, etc.) to make the model estimate keypoints for my own images (that are annotated with a bouding box). However, these images are of the format 128x128 instead of 256x256. To account for this, I changed in the following line: https://github.com/HRNet/HRNet-Facial-Landmark-Detection/blob/f776dbe8eb6fec831774a47209dae5547ae2cda5/lib/core/function.py#L194 [64,64] to [32,32]. This way, my points are better centered on the face, but they are still estimated really badly. Could this be because of the 128x128 input (although I thought I accounted for this)? Or do others also have performance problems on own images and might the problem be caused because of occlusions and lighting conditions in my own images?