Rhubarb Lip Sync to NodeJs allows you to quickly create 2D mouth animation from voice recordings. It analyzes your audio files, recognizes what is being said, then automatically generates lip sync information. You can use it for animating speech in computer games, animated cartoons, or any similar project.
Rhubarb Lip Sync integrates with the following applications:
In addition, you can use Rhubarb Lip Sync's command line interface (CLI) to generate files in various output formats (TSV/XML/JSON).
Rhubarb Lip Sync to Nodejs
rhubarb
, passing it an audio file as argument and telling it where to create the output file. In its simplest form, this might look like this: import { runCommands } from 'rhubarb-lip-sync';
const lipSyncJson = await runCommands(bufferAudio)
There are additional options you can specify in order to get better results.
stderr
and exit with a non-zero exit code.The first step in processing an audio file is determining what is being said. More specifically, Rhubarb Lip Sync uses speech recognition to figure out what sound is being said at what point in time. You can choose between two recognizers:
PocketSphinx is an open-source speech recognition library that generally gives good results. This is the default recognizer. The downside is that PocketSphinx only recognizes English dialog. So if your recordings are in a language other than English, this is not a good choice.
Rhubarb Lip Sync also comes with a phonetic recognizer. Phonetic means that this recognizer won't try to understand entire (English) words and phrases. Instead, it will recognize individual sounds and syllables. The results are usually less precise than those from the PocketSphinx recognizer. The advantage is that this recognizer is language-independent. Use it if your recordings are not in English.
The output of Rhubarb Lip Sync is a file that tells you which mouth shape to display at what time within the recording. You can choose between three file formats -- TSV, XML, and JSON. The following paragraphs show you what each of these formats looks like.
TSV is the simplest and most compact export format supported by Rhubarb Lip Sync. Each line starts with a timestamp (in seconds), followed by a tab, followed by the name of the mouth shape. The following is the output for a recording of a person saying 'Hi.'
Here's how to read it:
HH
" sound, anticipating the "AY
" sound that will follow.AY
" diphtong (0.31s into the recording) requires clenched teeth (shape {B}). Before that, shape {C} is inserted as an in-between at 0.27s. This allows for a smoother animation from {D} to {B}.extendedShapes
>> settings).