Closed elizagamedev closed 6 years ago
That certainly sounds like a useful suggestion. However, it will be rather tricky to get it right.
The problem is that the specified dialog is only used as a rough guide in the speech recognition phase. The recognizer slightly favors words and phrases from the dialog, but the recognition result is rarely an exact match with the dialog file. So in order to apply the timing information we have to the original dialog file, we'd have to perform a global alignment of the likely pronunciations of the dialog words with the timed words. That's feasible, but far from trivial.
Apart from the technical difficulty, I wonder whether per-word timing information really is the best solution to your problem. My guess is that people rarely want to display dialog on a word-by-word basis like a teleprinter. A more typical use case might be presenting text in larger chunks, such as sentences or sentence fragments.
One idea I had some time ago is allowing arbitrary XML elements in the dialog file. After processing, the original text+XML could be output, but each XML element could be augmented with timing information.
Take this input dialog file:
How are you?
<split/>
I'm Daniel.
The output could then be
How are you?
<split start="1.05" end="1.05"/>
I'm Daniel.
Alternatively: input:
<line>
How are you?</line><line>
I'm Daniel.</line>
Output:
<line start="0.13" end="1.05">
How are you?</line><line start="1.05" end="1.6">
I'm Daniel.</line>
This could be used for other purposes as well. For instance, to select a different facial expression at just the right moment:
Input:
That's great.
<angry>
Don't you agree?</angry>
Output:
That's great.
<angry start="0.54" end="1.23">
Don't you agree?</angry>
Or to trigger arbitrary animation:
Input:
That's enough.
<animation name="shoot-protagonist"/>
I warned you.
Output:
That's enough.
<animation name="shoot-protagonist" start="0.74" end="0.74"/>
I warned you.
What do you think?
Hmm, that does seem pretty useful... If possible, I'd like to see such a feature.
I'm closing this for now. I might revisit it another time.
Would it be practical to add functionality to export the times at which words are spoken in the optionally provided dialogue file? This would be a really useful feature for printing text at the precise moment that a character speaks it.