Specific Settings of the ToMe Model

GeWu-Lab / TSPM

Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.

14 stars 1 forks source link

I trained the model using the parameter settings specified in the code, and the results are as follows： Audio Count Acc: 77.48 % Audio Compt Acc: 60.44 % Audio Averg Acc: 71.20 %

Visual Count Acc: 76.69 % Visual Local Acc: 77.06 % Visual Averg Acc: 76.88 %

Audio-Visual Exist Acc: 76.92 % Audio-Visual Count Acc: 76.36 % Audio-Visual Local Acc: 59.89 % Audio-Visual Compt Acc: 63.67 % Audio-Visual Templ Acc: 66.55 % Audio-Visual Averg Acc: 69.17 %

---->Overall Accuracy: 71.57 %

Could you clarify where the issue occurred? Is it related to the "audio_patch" feature?

GeWu-Lab / TSPM

Specific Settings of the ToMe Model #3