MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.29k stars 21.48k forks source link

Higher frequency / fidelity of viseme events. #105803

Closed RealtimeGraphX closed 1 year ago

RealtimeGraphX commented 1 year ago

The text-to-visemes functionality is quite accurate for the first 1-2 seconds and then decreases significantly in quality. Viseme events then only occur 2-3 times per second. This results in very low quality blendshapes animation on a 3D character for example. Is there any way to increase the quality of viseme data? VisemeDataExample.pdf


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

YashikaTyagii commented 1 year ago

@RealtimeGraphX Thanks for your feedback! We will investigate and update as appropriate.

RamanathanChinnappan-MSFT commented 1 year ago

@RealtimeGraphX

The quality of viseme data depends on the quality of the speech synthesis. The Speech SDK supports viseme events during speech synthesis, which represent key poses in observed speech, such as the position of the lips, jaw, and tongue when producing a particular phoneme. The Speech SDK provides viseme events with viseme ID, Scalable Vector Graphics (SVG), or blend shapes.The overall workflow of viseme is depicted in the following flowchart.

You can use the viseme events to animate your avatar using a 2D or 3D rendering engine. The Speech SDK provides viseme events with viseme ID, Scalable Vector Graphics (SVG), or blend shapes. The viseme events include a series of frames in the Animation SDK property.These are grouped to best align the facial positions with the audio.Your 3D engine should render each group of BlendShapes frames immediately before the corresponding audio chunk.The FrameIndex value indicates how many frames preceded the current list of frames.

If you want to increase the quality of viseme data, you can try to improve the quality of the speech synthesis. You can also try to use a higher quality audio configuration.

Please Note, GitHub forum is dedicated for docs related issues. For any technical queries or clarifications, we encourage to utilise Microsoft Q & A platform. Kindly raise your query on Microsoft Q&A Platform

RealtimeGraphX commented 1 year ago

Thanks a lot for your answer. I now posted a question on the Q&A Platform.

RamanathanChinnappan-MSFT commented 1 year ago

@RealtimeGraphX

We are going to close this thread but if there are any further questions regarding the documentation, please tag me in your reply and we will be happy to continue the conversation.