face-analysis / emonet

Official implementation of the paper "Estimation of continuous valence and arousal levels from faces in naturalistic conditions", Antoine Toisoul, Jean Kossaifi, Adrian Bulat, Georgios Tzimiropoulos and Maja Pantic, Nature Machine Intelligence, 2021
https://www.nature.com/articles/s42256-020-00280-0
Other
267 stars 68 forks source link

Explanation of landmark's heatmap output #17

Open david-gimeno opened 1 year ago

david-gimeno commented 1 year ago

First of all, congratulation for your great work and thanks a lot for sharing it!

I was using EmoNet to extract some face embeddings. After inspecting the code and the output of the model, I would like to ask you more information or how to interpret the out["heatmap"] matrix that the model outputs. I saw that the shape of this tensor is (68, 64, 64). As you extracted 68 facial landmarks, my intuition is that it is kind of an attention matrix over the landmarks or, in other words, which of that landmarks were more relevant when predicting the emotional class. But, why 64x64?? Well, maybe I am wrong.

Thanks in advance,

David.

Developer1881 commented 1 year ago

@david-gimeno Hi David. Stuck with the same problem, any updates in your investigation?

david-gimeno commented 1 year ago

@Developer1881 According to my intuition, at some moment of the model forward the cropped face is embedded in a 64x64 latent representation, and then a heatmap is predicted for each one of the 68 facial landmarks, being the objective to concentrate the 'heat' in the position where the landmark should be. In other words, the model is learning to identify the position of each landmark via heatmaps over the face image. BUT, I am not sure.

Developer1881 commented 1 year ago

@david-gimeno as I tried, and looks pretty normal, is looks on 64x64 matrics as a probability that exact of 68 points in a 64 to 64 pixels. so then I'm extrapolating to 256x256