CommunityToolkit / Maui

The .NET MAUI Community Toolkit is a community-created library that contains .NET MAUI Extensions, Advanced UI/UX Controls, and Behaviors to help make your life as a .NET MAUI developer easier
https://learn.microsoft.com/dotnet/communitytoolkit/maui
MIT License
2.16k stars 357 forks source link

SpeechToText has horrendous performance issues and/or not working properly #1645

Closed sej69 closed 4 months ago

sej69 commented 6 months ago

Is there an existing issue for this?

Did you read the "Reporting a bug" section on Contributing file?

Current Behavior

First wrote this on SO: [https://stackoverflow.com/questions/77808446/communitytoolkit-speechtotext-performance-is-extremely-slow?noredirect=1#comment137188380_77808446]

I also found this in your bug reporting which is closed as unverified. https://github.com/CommunityToolkit/Maui/issues/1586 I have verified this now as not operating as your documents state it should using the below code.

The SpeechToText is not detecting silence and only reports back after it appears to timeout. I created a .Net Maui app using the following async method:

     async Task StartListener(CancellationToken cancellationToken)
    {
        var isGranted = await speechToText.RequestPermissions(cancellationToken);
        if (!isGranted)
        {
            await Toast.Make("Permission not granted").Show(CancellationToken.None);
            return;
        }

        do
        {
            Stopwatch sw = Stopwatch.StartNew();

            var recognitionResult = await speechToText.ListenAsync(
                                                CultureInfo.GetCultureInfo(Language),
                                                new Progress<string>(partialText =>
                                                {
                                                    RecognitionText += partialText + " ";
                                                }), cancellationToken);

            sw.Stop();

            if (recognitionResult.IsSuccessful)
            {
                RecognitionText = recognitionResult.Text;
                Debug.WriteLine("Success " + RecognitionText + " Time " + sw.Elapsed);
                RecognitionText = string.Empty;
            }
            else
            {
                Debug.WriteLine("failed - Time " + sw.Elapsed);
            }
        } while (!cancellationToken.IsCancellationRequested);
    }

When I run this on my iPad, it loads and waits. I speak, "This is a test" and then wait for 1 minute when it responds below in the debug window with the success message and time it took. It doesn't seem to matter if I say this phrase quickly after start or waiting for 30 seconds, it still seems to take a minute.

[0:] Success This is a test Time 00:01:00.9459623

This method is unusable in this current state. People won't wait a minute each time they speak for a response. And according to your docs, this method should detect silence.

Expected Behavior

This should detect silence and return text when it does. You probably also want to have a way to adjust the timeout value which doesn't appear to be there now.

Steps To Reproduce

  1. create a new .net maui application
  2. add the communitytoolkit
  3. use the above code to test

Link to public reproduction project repository

https://github.com/sej69/TestSpeechRecognizer

Environment

- .NET MAUI CommunityToolkit:
- OS: ios running on an iPad through my Mac (setup through VS)  iPad is on current version.
- .NET MAUI:

Anything else?

using current versions of all libraries and Maui. (.net 8) Very simple basic project.

bijington commented 6 months ago

What iPad are you running this on?

sej69 commented 6 months ago

My development ipad is an iPad mini.

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Shaun Lawrence @.> Sent: Tuesday, January 16, 2024 12:57:17 AM To: CommunityToolkit/Maui @.> Cc: Scott Johnson @.>; Author @.> Subject: Re: [CommunityToolkit/Maui] SpeechToText has horrendous performance issues and/or not working properly (Issue #1645)

What iPad are you running this on?

— Reply to this email directly, view it on GitHubhttps://github.com/CommunityToolkit/Maui/issues/1645#issuecomment-1893166536, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD2HLVLTHV3DVQ4F3H4KPWTYOYQE3AVCNFSM6AAAAABB3NWNMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGE3DMNJTGY. You are receiving this because you authored the thread.Message ID: @.***>

bijington commented 6 months ago

Sorry what model and age is the iPad?

sej69 commented 6 months ago

It’s a Mini Model 4

From: Shaun Lawrence @.> Sent: Friday, January 19, 2024 12:35 AM To: CommunityToolkit/Maui @.> Cc: Scott Johnson @.>; Author @.> Subject: Re: [CommunityToolkit/Maui] SpeechToText has horrendous performance issues and/or not working properly (Issue #1645)

Sorry what model and age is the iPad?

— Reply to this email directly, view it on GitHubhttps://github.com/CommunityToolkit/Maui/issues/1645#issuecomment-1899845545, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD2HLVOTTLQK7OGWLMG2SH3YPIHZNAVCNFSM6AAAAABB3NWNMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJZHA2DKNJUGU. You are receiving this because you authored the thread.Message ID: @.**@.>>

sej69 commented 5 months ago

I purchased a new 10” series 10 ipad yesterday and this is still an issue on this version as well.

From: Shaun Lawrence @.> Sent: Tuesday, January 16, 2024 12:57 AM To: CommunityToolkit/Maui @.> Cc: Scott Johnson @.>; Author @.> Subject: Re: [CommunityToolkit/Maui] SpeechToText has horrendous performance issues and/or not working properly (Issue #1645)

What iPad are you running this on?

— Reply to this email directly, view it on GitHubhttps://github.com/CommunityToolkit/Maui/issues/1645#issuecomment-1893166536, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD2HLVLTHV3DVQ4F3H4KPWTYOYQE3AVCNFSM6AAAAABB3NWNMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJTGE3DMNJTGY. You are receiving this because you authored the thread.Message ID: @.**@.>>

sej69 commented 5 months ago

In playing around with this a bit more, I discovered that the OnRecognitionTextComplete event does fire on the iPAD 10 if I hit a break point on the event "OnRecognitionTextUpdated" method and continue operation. It does not fire on its own though if I take out that breakpoint.

sej69 commented 4 months ago

This is running on an iPad 10, current / most recent hardware version.

The sample on this page: https://learn.microsoft.com/en-us/dotnet/communitytoolkit/maui/essentials/speech-to-text?tabs=android

seems to show a recognitionResult return, but this never seems to happen. I created a test project using this code and set a breakpoint on "if (recognitionResult.IsSuccessful)" and it never hits.

I thought it may have something to do with the interface I had set up per the documentation:

`public interface ISpeechToText { Task RequestPermissions();

Task<string> Listen(CultureInfo culture,
    IProgress<string> recognitionResult,
    CancellationToken cancellationToken);

}

// speech recognition code using the Interface:

private ISpeechToText speechToText; public SpeechListener(IAudioManager am, Iris.ISpeechToText stt) { speechToText = stt; } ..... var recognitionResult = await _speechToText.Listen(CultureInfo.GetCultureInfo("en-us"), new Progress(partialText => {

            Debug.WriteLine("--- " + partialText + "---");

            //RecognitionText += partialText + " "; // demo code shows to use +/, but that duplicates incoming text.
            RecognitionText = partialText + " ";  // this doesn't duplicate the incoming text, but it does continue to report an additional time after the timer end.

        }), cancellationToken);`

But even when I switch it up to using SpeechtoText.Default.ListenAsync()... eg:

` _cancellationToken = cancellationToken;

    var recognitionResult = await SpeechToText.Default.ListenAsync(CultureInfo.GetCultureInfo("en-us"),
        new Progress<string>(partialText =>
        {
            Debug.WriteLine("--- " + partialText + "---");
            RecognitionText += PartialText; // need to do += here or it won't add the additional words
            }), cancellationToken);
    if (recognitionResult.IsSuccessful) // breakpoint here
    {
        RecognitionText = recognitionResult.Text;
    }
    else
    {
        await Toast.Make(recognitionResult.Exception?.Message ?? "Unable to recognize speech").Show(CancellationToken.None);
    }`

This doesn't end the recognitionResult either. This method stays open and continues to recognize text.

To make matters worse, running the SpeechToText.Default... vs the iSpeechToText() variant seems to work "better". I've had to create timing routines as the ListenAsync never returns anything; it just keeps listening and spitting out info.

When running the interface version (iSpeechToText), it keeps coming back and reporting the same text one more time so it awakens my timer routing again. I can write some code to get around this behavior, but I shouldn't have to...

I pulled the interface out, but then I ran into an issue with the SpeechToText dropping the bluetooth headset AppDelegate config:

`public override bool FinishedLaunching(UIApplication application, NSDictionary launchOptions) {

SetAudioSession();

return base.FinishedLaunching(application, launchOptions);

}

public bool SetAudioSession() {

var audioSession = AVAudioSession.SharedInstance();
var err = audioSession.SetCategory(AVAudioSessionCategory.PlayAndRecord, AVAudioSessionCategoryOptions.AllowBluetooth |
                                    AVAudioSessionCategoryOptions.AllowAirPlay | AVAudioSessionCategoryOptions.DefaultToSpeaker);

if (err != null)
    return false;

err = audioSession.SetActive(true);

if (err != null)
    return false;

return true;

}`

The bluetooth works when the ISpeechRecognizer interface is used, but not the SpeechRecognizer.Default.ListenAsync. The headphone icon on the iPad goes away until I exit the app and then it shows up again and all audio destined to the audioManager interface goes through the iPad only. However, it seems the mic on the headphones still works through bluetooth. I've tracked it down to the "ListenAsync" method, if I comment this out, it works. If I add it back in, it doesn't. If I change it back to the interface iSpeechToText, it does work with my timing routines to get spoken text, but the method never returns.

I'd love to have the ListenAsync to detect silence and report back if possible. Or I can continue to use my timing routines, but I think something is broken with the API here.

sej69 commented 4 months ago

Another item you can look at. I downloaded the samples from here: https://github.com/CommunityToolkit/Maui/tree/main and pointed it to my iPad. It's exhibiting the same issue with not detecting end of speech or returning to advance the recognitionResult.

sej69 commented 4 months ago

Sorry, I should have tested the AppDelegate as well.

This:

` public override bool FinishedLaunching(UIApplication application, NSDictionary launchOptions) {

    SetAudioSession();

    return base.FinishedLaunching(application, launchOptions);
}

public bool SetAudioSession()
{

    var audioSession = AVAudioSession.SharedInstance();
    var err = audioSession.SetCategory(AVAudioSessionCategory.PlayAndRecord, AVAudioSessionCategoryOptions.AllowBluetooth |
                                        AVAudioSessionCategoryOptions.AllowAirPlay | AVAudioSessionCategoryOptions.DefaultToSpeaker);

    if (err != null)
        return false;

    err = audioSession.SetActive(true);

    if (err != null)
        return false;

    return true;

}`

Has the same effect with the sample code where the headset icon disappears and all audio still goes through the iPad and not the headphones.

VladislavAntonyuk commented 4 months ago
  1. There is no difference between using interface and SpeechToText.Default
  2. Feel free to open PR and include Bluetooth support for the speaker.
  3. Listen is designed for continuous listening. You can use StartListening/StopListening methods to get control over the recognition process
sej69 commented 4 months ago

I was wondering about that, but then why is there a difference (even in your sample application)?

What is PR?

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Vladislav Antonyuk @.> Sent: Wednesday, February 28, 2024 9:02:28 PM To: CommunityToolkit/Maui @.> Cc: Scott Johnson @.>; Author @.> Subject: Re: [CommunityToolkit/Maui] SpeechToText has horrendous performance issues and/or not working properly (Issue #1645)

  1. There is no difference between using interface and SpeechToText.Default
  2. Feel free to open PR and include Bluetooth support for the speaker.

— Reply to this email directly, view it on GitHubhttps://github.com/CommunityToolkit/Maui/issues/1645#issuecomment-1970311329, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD2HLVOR4GQAIBZDO34OL2DYV2M4JAVCNFSM6AAAAABB3NWNMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZQGMYTCMZSHE. You are receiving this because you authored the thread.Message ID: @.***>

VladislavAntonyuk commented 4 months ago

It depends how you registered ISpeechToText, but if you register it correctly there is no difference. Pull Request.

vchelaru commented 4 months ago

I have noticed that on Android, the silence is automatically detected and the listening ends successfully. On iOS, it does not. I have added info here, but this issue was closed due to me not having a sample:

https://github.com/CommunityToolkit/Maui/issues/1723

sej69 commented 4 months ago

Thank you! I'm looking forward in trying this out!

Get Outlook for Androidhttps://aka.ms/AAb9ysg


From: Vladislav Antonyuk @.> Sent: Thursday, March 14, 2024 6:00:21 PM To: CommunityToolkit/Maui @.> Cc: Scott Johnson @.>; Author @.> Subject: Re: [CommunityToolkit/Maui] SpeechToText has horrendous performance issues and/or not working properly (Issue #1645)

Closed #1645https://github.com/CommunityToolkit/Maui/issues/1645 as completed via #1741https://github.com/CommunityToolkit/Maui/pull/1741.

— Reply to this email directly, view it on GitHubhttps://github.com/CommunityToolkit/Maui/issues/1645#event-12124866563, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD2HLVKZNY3UKQBTFGIJYRTYYITYLAVCNFSM6AAAAABB3NWNMOVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJSGEZDIOBWGY2TMMY. You are receiving this because you authored the thread.Message ID: @.***>