capacitor-community / text-to-speech

⚡️ Capacitor plugin for synthesizing speech from text.
MIT License
99 stars 28 forks source link

bug: Audio playing from other App does not return to full volume after finishing speaking #138

Open Solarus8 opened 2 weeks ago

Solarus8 commented 2 weeks ago

Plugin version: "@capacitor-community/text-to-speech": "^5.0.0"

Platform(s): iPhone 15 Pro Max, iOS 18.0.1

Current behavior: When playing other audio (music app - like Pandora) the volume of that audio does not return to full volume after it is attenuated during the text-2-speech content is finished playing

Expected behavior: The audio of any other app will return to full volume after the text-2-speech content is finished playing.

Steps to reproduce: Open Pandora (or other audio app), open capacitor app with text-2-speech, play text, notice volume is about halved when text is read (this is good) and then the volume of Pandora stays at halved volume until the app is sent to the background or closed.

robingenz commented 2 weeks ago

Would you be willing to create a PR?

flexiblefactory commented 2 weeks ago

I think something like this could be the fix:

// Reset the audio session category to normal after TTS finishes

    public func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didCancel utterance: AVSpeechUtterance) {
        // Reset session when speech cancelled
         try AVAudioSession.sharedInstance().setActive(false)
         self.resolveCurrentCall()
    }

    public func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        // Reset session when speech finishes
        try AVAudioSession.sharedInstance().setActive(false)
        self.resolveCurrentCall()
    }
flexiblefactory commented 2 weeks ago

I have tested calling AVAudioSession.sharedInstance().setActive(false)) after the speech and confirmed that this is the fix to restore normal audio volume to other applications

Solarus8 commented 2 weeks ago

Great! Did you wrap that into a capacitor plugin, will you issue a PR?

On Mon, Oct 21, 2024 at 6:53 AM flexiblefactory @.***> wrote:

I have tested calling AVAudioSession.sharedInstance().setActive(false)) after the speech and confirmed that this is the fix to restore normal audio volume to other applications

— Reply to this email directly, view it on GitHub https://github.com/capacitor-community/text-to-speech/issues/138#issuecomment-2426589578, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA45SGEYELMHEN563OVWLSDZ4T2NTAVCNFSM6AAAAABQIWETT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRWGU4DSNJXHA . You are receiving this because you authored the thread.Message ID: @.***>

flexiblefactory commented 2 weeks ago

It is a plugin currently and works as a workaround. I do think it would make sense for the changes above to be integrated into the TTS plugin.

physxP commented 2 weeks ago

I think something like this could be the fix:

// Reset the audio session category to normal after TTS finishes

    public func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didCancel utterance: AVSpeechUtterance) {
        // Reset session when speech cancelled
         try AVAudioSession.sharedInstance().setActive(false)
         self.resolveCurrentCall()
    }

    public func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
        // Reset session when speech finishes
        try AVAudioSession.sharedInstance().setActive(false)
        self.resolveCurrentCall()
    }

I believe this approach may cause UI stuttering if setActive is called during a UI animation or, for instance, when a Camera Preview is active. Handling AVAudioSession on a background queue introduces a lag of approximately 300-500ms, varying based on the device. A more appropriate solution might be to provide explicit start or stop methods, allowing users to control when the audio session is activated or deactivated, avoiding these performance issues. This would give developers finer control over managing the audio session in relation to their app's lifecycle and UI elements.

cc: @robingenz @Solarus8

flexiblefactory commented 2 weeks ago

The other aspect to this is that the plugin also sets active true at startup time, causing other audio to "duck" as soon as the plugin loads (and then ducking persists until the app quits).

physxP commented 1 week ago

The other aspect to this is that the plugin also sets active true at startup time, causing other audio to "duck" as soon as the plugin loads (and then ducking persists until the app quits).

That's true. So, if setActive method is configurable from TS side then think it could fix all of these issues at once? This could be an API breaking change if implemented in the most efficient manner. I am thinking of creating a PR for this to keep both worlds happy.