Rhubarb Lip Sync is a command-line tool that automatically creates 2D mouth animation from voice recordings. You can use it for characters in computer games, in animated cartoons, or in any other project that requires animating mouths based on existing recordings.
Hi, I'm working on an app where an llm response is converted into speech using tts and the audio is played alongside an animation of a character moving its mouth. Is there a way to use your library to do this or perhaps you could point me to a better option? Another option I'm considering is to create an ascii animation and move the lips up and down based on the waveform. Do you know how I might approach this? I understand that simply looking at the crests and troughs and aligning the lips to those doesn't work.
Thanks
Rhubarb is optimized for use in production pipelines and doesn't have any real-time support. Regarding alternatives:
Opening the mouth based on the power of the audio signal works to a degree, but tends to look rather bad.
Ironically, running a simple VAD to distinguish speech segments from pauses, then filling the speech segments with random mouth movements may even look better.
Depending on your TTS system, you may be able to get precise phoneme timings without any extra work. Depending on your C++ skills, you may be able to hack Rhubarb to directly take these timings as input, skipping all the time-consuming speech recognition work.
Hi, I'm working on an app where an llm response is converted into speech using tts and the audio is played alongside an animation of a character moving its mouth. Is there a way to use your library to do this or perhaps you could point me to a better option? Another option I'm considering is to create an ascii animation and move the lips up and down based on the waveform. Do you know how I might approach this? I understand that simply looking at the crests and troughs and aligning the lips to those doesn't work. Thanks