Hangz-nju-cuhk / Talking-Face-Generation-DAVS

Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)
MIT License
816 stars 173 forks source link

Table 3: Audio-Visual Speech Recognition and 1:25000 audio-video retrieval results with different supervisions. #15

Open zzzzhuque opened 5 years ago

zzzzhuque commented 5 years ago

Hi, after reading the paper, I am confused about the table 3. What is the meaning of visual acc, audio acc and combine acc? How did you calculate the result of 67.5%, 91.8%, 95.2%? default

Hangz-nju-cuhk commented 5 years ago

HI @ZHUTAO142857 , sorry that I didn't notice this issue before.

I performed the audio-visual recognition task (word classification for LRW) as written in the paper and these are the accuracies of the classification using only video or audio or combination.