despossivel / rhubarb-lip-sync

Rhubarb Lip Sync is a powerful tool for creating 2D mouth animations from voice recordings. It analyzes audio files, recognizes speech, and automatically generates lip sync data. Ideal for animating speech in games, cartoons, and similar projects.
4 stars 0 forks source link

Follow Twitter Build status

Logo

Rhubarb Lip Sync to NodeJs allows you to quickly create 2D mouth animation from voice recordings. It analyzes your audio files, recognizes what is being said, then automatically generates lip sync information. You can use it for animating speech in computer games, animated cartoons, or any similar project.

Rhubarb Lip Sync integrates with the following applications:

In addition, you can use Rhubarb Lip Sync's command line interface (CLI) to generate files in various output formats (TSV/XML/JSON).

How to run Rhubarb Lip Sync

General usage

Rhubarb Lip Sync to Nodejs

import { runCommands } from 'rhubarb-lip-sync';
const lipSyncJson = await runCommands(bufferAudio)     

There are additional options you can specify in order to get better results.

Recognizers

The first step in processing an audio file is determining what is being said. More specifically, Rhubarb Lip Sync uses speech recognition to figure out what sound is being said at what point in time. You can choose between two recognizers:

PocketSphinx

PocketSphinx is an open-source speech recognition library that generally gives good results. This is the default recognizer. The downside is that PocketSphinx only recognizes English dialog. So if your recordings are in a language other than English, this is not a good choice.

Phonetic

Rhubarb Lip Sync also comes with a phonetic recognizer. Phonetic means that this recognizer won't try to understand entire (English) words and phrases. Instead, it will recognize individual sounds and syllables. The results are usually less precise than those from the PocketSphinx recognizer. The advantage is that this recognizer is language-independent. Use it if your recordings are not in English.

Output formats

The output of Rhubarb Lip Sync is a file that tells you which mouth shape to display at what time within the recording. You can choose between three file formats -- TSV, XML, and JSON. The following paragraphs show you what each of these formats looks like.

Tab-separated values (TSV)

TSV is the simplest and most compact export format supported by Rhubarb Lip Sync. Each line starts with a timestamp (in seconds), followed by a tab, followed by the name of the mouth shape. The following is the output for a recording of a person saying 'Hi.'

[source]

0.00 X 0.05 D 0.27 C 0.31 B 0.43 X 0.47 X

Here's how to read it:

Find out more here