csdcorp / speech_to_text

A Flutter plugin that exposes device specific text to speech recognition capability.
BSD 3-Clause "New" or "Revised" License
351 stars 218 forks source link

Merge alternatives to detect if user continues #436

Closed ichitaka closed 7 months ago

ichitaka commented 8 months ago

This is the simple fix that solves the issue where the user takes a small break in speaking that is less than the configured pause. In that case the speech recognition would stop as after a break, further detection becomes part of the next element in alternatives. Right now I can't find further reasons why the alternatives were defined like this.

This has been tested on Web.

sowens-csd commented 7 months ago

Sorry, I don't understand why you want to do this.

Alternates, as received from the underlying speech recognition engine, are usually alternate possible interpretations of the received speech. For example [ 'four', 'for', 'fir', 'fiord']. I don't see why you'd want them merged into a single string. Could you explain what you were trying to achieve with this?

ichitaka commented 7 months ago

That is sadly not true for the Web version at least. From my testing, alternates contains other pieces of the transcriptions that happen after a break.

So if I say: "Let me think about it. You are right!", the alternates list will look like the following: ["Let me think about it", "You are right"]

I'm sure this isn't the most appropriate solution, but it fixes the issue of dictation over a long period, that I was facing. Maybe a platform interface solution makes sense here.

sowens-csd commented 7 months ago

Oh! Good find, thank you. I'll have a look at that.

ichitaka commented 7 months ago

I've found another issue that causes early timeouts. This section does not make sense to me. As we stop the listen according to the variables _elapsedListenMillis and _elapsedSinceSpeechEvent, why do we need to update our reference values? _elapsedListenMillis & _elapsedSinceSpeechEvent are being updated already.

Tested on web and removing this section solves an issue of early timeout.

   if (null != pauseFor) {
      var remainingMillis = pauseFor.inMilliseconds -
          (ignoreElapsedPause ? 0 : _elapsedSinceSpeechEvent);
      pauseFor = Duration(milliseconds: max(remainingMillis, 0));
    }
    if (listenFor != null) {
      var remainingMillis = listenFor.inMilliseconds - _elapsedListenMillis;
      listenFor = Duration(milliseconds: max(remainingMillis, 0));
    }
sowens-csd commented 7 months ago

Just getting back to this and tested it on Chrome, and, of course, you're right! I had completely misunderstood the spec for the web version because I was trying to fit my experience from Android and iOS into the web framework. The web structure is more complicated. It provides a series of utterances in the first level results then a series of alternate in a second level set of results under each of the first results.

sowens-csd commented 7 months ago

However, the fix for this isn't in the right place. The actual change should be in speech_to_text_web.dart. I'll make that change now and you can try it out from the repo.

sowens-csd commented 7 months ago

I've found another issue that causes early timeouts. This section does not make sense to me. As we stop the listen according to the variables _elapsedListenMillis and _elapsedSinceSpeechEvent, why do we need to update our reference values? _elapsedListenMillis & _elapsedSinceSpeechEvent are being updated already.

Tested on web and removing this section solves an issue of early timeout.

   if (null != pauseFor) {
      var remainingMillis = pauseFor.inMilliseconds -
          (ignoreElapsedPause ? 0 : _elapsedSinceSpeechEvent);
      pauseFor = Duration(milliseconds: max(remainingMillis, 0));
    }
    if (listenFor != null) {
      var remainingMillis = listenFor.inMilliseconds - _elapsedListenMillis;
      listenFor = Duration(milliseconds: max(remainingMillis, 0));
    }

The update of these properties fixed an issue identified in #191

sowens-csd commented 7 months ago

There's a new version in the repo now that has my changes to improve web handling of multiple phrases. Let me know if you have a chance to try it.

sowens-csd commented 7 months ago

These changes are ready for 6.4.0. I'm going to close this PR as it is now obsolete after those code changes.

If you have a chance please test the changes. If you want to have a look they are in speech_to_text_web.dart and balanced_alternates.dart.