choijeongsoo / av2av

[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
MIT License
24 stars 2 forks source link

Vocoder training code #5

Open ilucasgoncalves opened 5 months ago

ilucasgoncalves commented 5 months ago

Great work! Could you provide the vocoder training code?

choijeongsoo commented 3 months ago

Hello, thank you for your interest in our work and sorry for the delayed response.

You can refer to the speech-resynthesis repository for the vocoder training code.

We used speaker embedding extracted from a pre-trained speaker encoder and didn't use pitch information.