Open sarutobiumon opened 1 year ago
- Would be amazing if you can:
- Turn the "talking head" images into animated gifs lip-sync'ed to the wav audio generated by tts using Bark (Bark is currently the best and most realstic/emotion-driven audio model that is free to use, even better than the best commercial closed source model Eleven Labs)
- Then generating an mp4 from the combination of animated gif and wav audio on the fly, replacing the starting-point animated gif on the screen.
This can be done by integrating code from one of the following choices:
- from this one-click install GUI: https://www.youtube.com/watch?v=f_NUZDBiaZg
- or using Sadtalker: https://www.youtube.com/watch?v=aJIq_UoZv24
- or this google colab python code below (supports 30+ languages): https://spltech.co.uk/using-wav2lip-and-google-cloud-wavenet-to-create-voice-overs-in-more-than-30-languages/
- or using VideoReTalking:Audio-based Lip Synchronization for Talking Head Video https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb Demo https://www.youtube.com/watch?v=CgZVKSkdtRo
Bark oobabooga tts extention: https://github.com/wsippel/bark_tts
Hi, Thanks for your suggestions. We will try to add these models into AudioGPT as soon as.
This can be done by integrating code from one of the following choices:
Bark oobabooga tts extention: https://github.com/wsippel/bark_tts