Open-Debin / Emotion-FAN

ICIP 2019: Frame Attention Networks for Facial Expression Recognition in Videos
MIT License
337 stars 76 forks source link

about self-attention and relation-attention #5

Closed oukohou closed 4 years ago

oukohou commented 4 years ago

in your code Demo_AFEW_Attention.py, seems the self-attention and relation-attention can not be used simultaneously?

at_type = ['self-attention', 'relation-attention'][args.at_type]
print('The attention is ' + at_type)

This seems different from your paper: image

If so, why?

Open-Debin commented 4 years ago

Thanks for your comment. In the code, 'relation-attention' means using both 'self-attention' and 'relation-attention' . Because the 'relation-attention' is base on a global feature that is the output of 'self-attention'.

The naming 'relation-attention' here will be ambiguous, thank you for your question, I will modify it

oukohou commented 4 years ago

understood, thanks!

oukohou commented 4 years ago

sorry to bother again, but the code logic is rather complicated to me, so i think better another question again: If i understand correctly, in your Demo_AFEW_Attention.py, the function validate() only inference one single image instead of all three images? which is different from the train processing flow.

for i, (input_var, target, index) in enumerate(val_loader):
            # compute output
            target = target.cuda(async=True)
            input_var = torch.autograd.Variable(input_var)
            ''' model & full_model'''
            f, alphas = model(input_var, phrase='eval')

            pred_score = 0
            output_store_fc.append(f)
            output_alpha.append(alphas)
            target_store.append(target)
            index_vector.append(index)

            # measure elapsed time
            batch_time.update(time.time() - end)
            end = time.time()

If so, why is that? And what's does the index_matrix actually do? Why is the eval procedure is different from the train procedure?

Thank in advance!

Open-Debin commented 4 years ago

Hello, the method is consistent with the paper. So, each prediction of a video is from the inference of all frames of the video. The index_matrix is telling which frame belongs to which video. You can get the shape of index_matrix. [num_of_videos, num_of_frams_in_entire_database]. I hope my answer could help you. Thanks for your interested in my project, could you please give me a star? Thanks.

Open-Debin commented 3 years ago

@oukohou Merry Christmas, I recently update the Emotion-FAN, new features include data process, environment install, CK+ code, Baseline code, and more detail instructions. Also, you can find the old version directory of Emotion-FAN in the README.md. I hope my new updates can help you greatly. Please see the Emotion-FAN for more details.