av-savchenko / face-emotion-recognition

Efficient face emotion recognition in photos and videos
Apache License 2.0
686 stars 127 forks source link

Can I get a multi-task learning model file? #30

Closed saeu5407 closed 1 year ago

saeu5407 commented 1 year ago

Thank you for the good paper.

I'm interested in your work, so I want to check the multi-task learning side, so I'm leaving an issue. I have two questions.

  1. Multi-task learning training 1) If understanding is correct, is it correct to freeze the weight of the backbone and learn only the weight of the head? 2) In this case, is it practically the same as each of you learned?

  2. Multi-task learning model Can I get a multi-task learning model file that I learned? I'd like to check Aruosal, Valence, etc.

av-savchenko commented 1 year ago

Hello! Thanks for your interest to my work 1 and 3. The model trained via multi-task learning on AffectNet to predict facial expressions, valence and arousal is available in this repository: enet_b0_8_va_mtl.pt. The training script for the original version of AffectNet is also available here - see section Multi-task: FER+Valence-Arousal. Finally, there is a source code for multi-task learning challenge from ABAW competition. I do not distribute the model because it does not work well on real video and photos

  1. You understand it correctly. It is also possible to finetune the whole network after training new head. Examples are available in the Jupyter notebooks mentioned in item 1 above.
saeu5407 commented 1 year ago

Thank you so much

arvindsaraf commented 3 months ago

Hi - I'm trying to use the multi-task models - enet_b0_8_va_mtl.pt & mobilevit_va_mtl.pt - and am trying to understand the output tensor with 10 values format. Is the below understanding - basis the training code here - https://github.com/av-savchenko/face-emotion-recognition/blob/main/src/affectnet/train_emotions-pytorch.ipynb - correct?

  1. for 8 output, the emotion outputs are 0-7 - with index wise values corresponding to the emotions ['anger', 'contempt', 'disgust', 'fear', 'happiness', 'neutral', 'sadness', 'surprise']
  2. outputs 8 & 9 are for valence & arousal respectively
  3. Sample code to understand it is below
emotion_labels = ['anger', 'contempt', 'disgust', 'fear', 'happiness', 'neutral', 'sadness', 'surprise']
emotions = F.softmax(outputs[0][:num_emotion_classes], dim=0).cpu().numpy()
valence = torch.tanh(outputs[0][num_emotion_classes]).item()
arousal = torch.tanh(outputs[0][num_emotion_classes+1]).item()

Please correct if otherwise. Look forward to your response, thanks.

cc: @rajatdhariwal @smruti-AT

av-savchenko commented 3 months ago

Hello! As far as I can see, you're correct. BTW, if you just want to use our models for inference, it will be much more simple to use hsemotion package (from pip or from repository). BTW, the inference code is available there, so you could verufy your code by comparing it with facial_emotions.py from the latter repository.