georgesterpu / avsr-tf1

Audio-Visual Speech Recognition using Sequence to Sequence Models
GNU General Public License v3.0
81 stars 28 forks source link

How can i change this av_align model for applying to audio2video? #17

Closed LeeYongHyeok closed 5 years ago

LeeYongHyeok commented 5 years ago

Hello, @georgesterpu. Thanks for the code release. I made your av_align model the basis of my research.

In your paper, the cross-modal alignment of Video to Audio is working well. However, Audio-to-video cross-modal alignment may not work well.

So I want to see how cross-modal alignment of video-to-audio works by using both audio-video and video-to-audio simultaneously.

So I checked the structure of the AttentiveEncoder class in your code avsr / encoder.py.

I found that the AttentiveEncoder uses the normal video encoder's output and audio data to create a video-to-audio AttentiveEncoder at once.

I would like to have video-to-audio and audio-to-video at the same time, but I think this is not possible with the current code structure.

Which part do I need to modify so that I can use your AttentiveEncoder at the same time?

I am very pleased and thank you for doing research in the same research field.

Sincery, YongHyeok Lee.