Bark audio model and talking head additions

AIGC-Audio / AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Other

9.89k stars 844 forks source link

Would be amazing if you can:
1. Turn the "talking head" images into animated gifs lip-sync'ed to the wav audio generated by tts using Bark (Bark is currently the best and most realstic/emotion-driven audio model that is free to use, even better than the best commercial closed source model Eleven Labs)
2. Then generating an mp4 from the combination of animated gif and wav audio on the fly, replacing the starting-point animated gif on the screen.

This can be done by integrating code from one of the following choices:

from this one-click install GUI: https://www.youtube.com/watch?v=f_NUZDBiaZg
or using Sadtalker: https://www.youtube.com/watch?v=aJIq_UoZv24
or this google colab python code below (supports 30+ languages): https://spltech.co.uk/using-wav2lip-and-google-cloud-wavenet-to-create-voice-overs-in-more-than-30-languages/
or using VideoReTalking：Audio-based Lip Synchronization for Talking Head Video https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb Demo https://www.youtube.com/watch?v=CgZVKSkdtRo

Bark oobabooga tts extention: https://github.com/wsippel/bark_tts

Would be amazing if you can:

Turn the "talking head" images into animated gifs lip-sync'ed to the wav audio generated by tts using Bark (Bark is currently the best and most realstic/emotion-driven audio model that is free to use, even better than the best commercial closed source model Eleven Labs)

Then generating an mp4 from the combination of animated gif and wav audio on the fly, replacing the starting-point animated gif on the screen.

This can be done by integrating code from one of the following choices:

from this one-click install GUI: https://www.youtube.com/watch?v=f_NUZDBiaZg

or using Sadtalker: https://www.youtube.com/watch?v=aJIq_UoZv24

or this google colab python code below (supports 30+ languages): https://spltech.co.uk/using-wav2lip-and-google-cloud-wavenet-to-create-voice-overs-in-more-than-30-languages/

or using VideoReTalking：Audio-based Lip Synchronization for Talking Head Video https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb Demo https://www.youtube.com/watch?v=CgZVKSkdtRo

Bark oobabooga tts extention: https://github.com/wsippel/bark_tts

Hi, Thanks for your suggestions. We will try to add these models into AudioGPT as soon as.

AIGC-Audio / AudioGPT

Bark audio model and talking head additions #24