ElenaRyumina / AVCER

Zero-Shot Audio-Visual Compound Expression Recognition Method based on Emotion Probability Fusion
https://elenaryumina.github.io/AVCER
5 stars 1 forks source link

Very cool how can it be implemented elsewhere? #9

Closed G-force78 closed 1 month ago

G-force78 commented 2 months ago

Other than cutting edge implementations such as EMO and VASA https://www.microsoft.com/en-us/research/project/vasa-1/ img2video and text2video lack authentic expressions so I wonder if this could be used to identify emotional expressions in datasets which can then be trained on and tokenised for use in text2video? Which could also be numerically weighted like in stable diffusion eg: photo of a sad man(0.5) in a raincoat browsing in shop windows. Eventually could be used in robots too.

ElenaRyumina commented 1 month ago

Dear @G-force78,

To annotate corpora, several experts (trained humans) are used; their agreement on the annotation allows a more reliable determination of emotions. For automatic annotation, our model can be used; however, I recommend using several AI models and determining emotions based on their agreement on emotion prediction. Only then can the annotated data be trained and tokenized for use in text2video and other tasks.