Code for paper "MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition"
16
stars
1
forks
source link
It is unclear how to achieve temporal alignment between video and audio stream #2
Open
LindgeW opened 4 months ago
please give more details