Azure-Samples / cognitive-services-speech-sdk

Sample code for the Microsoft Cognitive Services Speech SDK
MIT License
2.79k stars 1.82k forks source link

[iOS] How to fix an issue where my 3D Blendshapes do not align with the audio. #2481

Open AmAdevs opened 2 months ago

AmAdevs commented 2 months ago

In addVisemeReceivedEventHandler, I receive event.animation. I want to use Viseme 3D Blend Shapes to drive my 3D Avatar.

Here is an example JSON: { "FrameIndex": 0, "BlendShapes": [ [0.021, 0.321, ..., 0.258], [0.045, 0.234, ..., 0.288], ... ] } However, in the first round, I couldn't get the JSON; I got a warning: print("Error parsing JSON: (error)") In the second round, I could get the JSON and set the weights for my 3D model. It works, but it doesn't align with the audio.

Can someone help me fix this issue?

Thank you in advance.

This my code:

func synthesisToSpeaker() {
        guard let subscriptionKey = sub, let region = region else {
            print("Speech key and region are not set.")
            return
        }
        
        var speechConfig: SPXSpeechConfiguration?
        do {
            try speechConfig = SPXSpeechConfiguration(subscription: subscriptionKey, region: region)
        } catch {
            print("Error creating speech configuration: \(error)")
            return
        }
        
        speechConfig?.speechSynthesisVoiceName = "en-US-AvaMultilingualNeural"
        speechConfig?.setSpeechSynthesisOutputFormat(.raw16Khz16BitMonoPcm)
        
        guard let synthesizer = try? SPXSpeechSynthesizer(speechConfig!) else {
            print("Error creating speech synthesizer.")
            return
        }
        
        let ssml = """
               <speak version='1.0' xml:lang='en-US' xmlns='http://www.w3.org/2001/10/synthesis'
                          xmlns:mstts='http://www.w3.org/2001/mstts'>
                          <voice name='en-US-CoraNeural'>
                    <mstts:viseme type='FacialExpression'/>
                 Hello World, May I help you?
                </voice>
               </speak>
               """
        
        // Subscribe to viseme received event
        synthesizer.addVisemeReceivedEventHandler { (synthesizer, event) in
            self.mapBlendshapesToModel(jsonString: event.animation,
                                       node: self.contentNode)
           //print("\(event.animation)")
        }
        
        do {
            let result = try synthesizer.speakSsml(ssml)
            
            switch result.reason {
            case .recognizingSpeech:
                print("Synthesis recognizingSpeech")
            case .recognizedSpeech:
                print("Synthesis recognizedSpeech")
            case .synthesizingAudioCompleted:
                print("Synthesis synthesizingAudioCompleted")
            default:
                print("Synthesis failed: \(result.description)")
            }
        } catch {
            debugPrint("speakSsml failed")
        }
    }

func mapBlendshapesToModel(jsonString: String, node: SCNNode?) {
        guard let jsonData = jsonString.data(using: .utf8) else {
            print("Invalid JSON Data")
            return
        }
        
        guard let node = node else {
            print("Node is nil")
            return
        }
        
        do {
            let json = try JSONSerialization.jsonObject(with: jsonData, options: [])
            if let dictionary = json as? [String: Any] {
                if let frameIndex = dictionary["FrameIndex"] as? Int,
                   let blendShapes = dictionary["BlendShapes"] as? [[Double]] {
                    //setup my 3d
                }
            }
        } catch {
            print("Error parsing JSON: \(error)")
        }
    }
github-actions[bot] commented 1 month ago

This item has been open without activity for 19 days. Provide a comment on status and remove "update needed" label.