google-ai-edge / mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
https://mediapipe.dev
Apache License 2.0
26.1k stars 5.04k forks source link

Text To Speech to Facial BlendShapes #4428

Open GeorgeS2019 opened 1 year ago

GeorgeS2019 commented 1 year ago

MediaPipe Solution (you are using)

Part: 2 => Face Blendshape: May 2023 ->? Part: 1 => Done: ARKit 52 blendshapes support request. June 2022 to April 2023 Completed

Programming language

c#

Are you willing to contribute it

Yes:

Describe the feature and the current behaviour/state

From the Modelling part using Godot
https://github.com/srcnalt/ReadyPlayerMe-Godot-Test/issues/1#issue-1713856035

Will this change the current API? How?

YES, additional non-conflicting API to the existing current API

Who will benefit with this feature?

Anyone who use MediaPipe BlendShape. It is NEXT STEP to Deep AI (Integrating Deep Audio to MediaPipe)

Please specify the use cases for this feature

User use ChatGPT or something similar to generate replies and this new feature translate the replies to speech with corresponding Avatar Blendshapes manipulation

Any Other info

No response

GeorgeS2019 commented 1 year ago

How the API looks Like ?

Given a ChatGPT or something similar from Google reply in text, the API will receive this string and output

  1. the corresponding facial blendshapes as Time coordinated list of Dictionary[ blendshapeName, blendshapeValueFloat]
  2. Voice (mp3 or WAV) that aligns with the blendshapeValues
endink commented 1 year ago

I have done this feature in Unreal Engine, it is easy to implement It use PaddleLite + OvrLipSync .😄

GeorgeS2019 commented 1 year ago

@endink This is just Part 2 of many parts ahead :-)

FishWoWater commented 1 year ago

Agreed! It would be really exciting if blendshapes could be estimated and aligned with input audio clip.

I am currently working on a pipeline: user voice->speech recognition->chatgpt->text to speech->blendshapes. There exist many mature solutions except for the last stage (speech2blendshapes). Lipsync and face good can possibly do this, but have their limitations or problems. This feature will benefit the mediapipe community.

ayushgdev commented 1 year ago

Hello @GeorgeS2019 Thanks for raising this amazing feature request. We will discuss it internally and prioritise it in our roadmap. However, just a heads up, we are working in numerous fronts as of now hence this might get delayed.

GeorgeS2019 commented 1 year ago

Now working, the BlendShape part in 8th Top Ranked Github Open source 3D game engine: Godot @srcnalt @kaiidams @SpookyCorgi @you-win @j20001970 Godot_v4 0 3-rc2_mono_win64_JU4OlmIfLZ

kuaashish commented 1 year ago

Hello @lu-wang-g, Could you please look into this amazing feature request? Thank you!!

lu-wang-g commented 1 year ago

At I/O 2023, Google released the demo app, Talking Character (https://developers.googleblog.com/2023/05/generative-ai-talking-character.html), which IIUC fits exactly the use case described here. The Web demo is partially open sourced here. You can find useful pieces of components in the directory. There has also been a discussion of releasing the talking character pipeline through MediaPipe, but we don't have concrete plan yet.

@ayushgdev and @kuaashish, do we have ways to track user requests like this?

tiamy commented 9 months ago

+1

GeorgeS2019 commented 2 months ago

We now have C# wrapper of Godot Mediapipe

GeorgeS2019 commented 2 months ago

The Godot community will attempt Text to Face => follow here