DanielSWolf / rhubarb-lip-sync

Rhubarb Lip Sync is a command-line tool that automatically creates 2D mouth animation from voice recordings. You can use it for characters in computer games, in animated cartoons, or in any other project that requires animating mouths based on existing recordings.
Other
1.8k stars 215 forks source link

"Animated" dialogue text? #23

Closed elizagamedev closed 6 years ago

elizagamedev commented 6 years ago

Would it be practical to add functionality to export the times at which words are spoken in the optionally provided dialogue file? This would be a really useful feature for printing text at the precise moment that a character speaks it.

DanielSWolf commented 6 years ago

That certainly sounds like a useful suggestion. However, it will be rather tricky to get it right.

The problem is that the specified dialog is only used as a rough guide in the speech recognition phase. The recognizer slightly favors words and phrases from the dialog, but the recognition result is rarely an exact match with the dialog file. So in order to apply the timing information we have to the original dialog file, we'd have to perform a global alignment of the likely pronunciations of the dialog words with the timed words. That's feasible, but far from trivial.

Apart from the technical difficulty, I wonder whether per-word timing information really is the best solution to your problem. My guess is that people rarely want to display dialog on a word-by-word basis like a teleprinter. A more typical use case might be presenting text in larger chunks, such as sentences or sentence fragments.

One idea I had some time ago is allowing arbitrary XML elements in the dialog file. After processing, the original text+XML could be output, but each XML element could be augmented with timing information.

Take this input dialog file:

How are you? <split/> I'm Daniel.

The output could then be

How are you? <split start="1.05" end="1.05"/> I'm Daniel.

Alternatively: input:

<line>How are you?</line><line>I'm Daniel.</line>

Output:

<line start="0.13" end="1.05">How are you?</line><line start="1.05" end="1.6">I'm Daniel.</line>

This could be used for other purposes as well. For instance, to select a different facial expression at just the right moment:

Input:

That's great. <angry>Don't you agree?</angry>

Output:

That's great. <angry start="0.54" end="1.23">Don't you agree?</angry>

Or to trigger arbitrary animation:

Input:

That's enough. <animation name="shoot-protagonist"/>I warned you.

Output:

That's enough. <animation name="shoot-protagonist" start="0.74" end="0.74"/>I warned you.

What do you think?

elizagamedev commented 6 years ago

Hmm, that does seem pretty useful... If possible, I'd like to see such a feature.

DanielSWolf commented 6 years ago

I'm closing this for now. I might revisit it another time.