felixjunghans / google_speech

Flutter google spech
MIT License
68 stars 40 forks source link

Is it possible to get the confidence by triggering a function? #4

Closed tommycyf closed 3 years ago

tommycyf commented 4 years ago

Hi, thanks for this great package. It really helped me a lot.

Is it possible to get the confidence immediately by triggering a function instead of stopping the recorder and waiting for gRPC to stop the recognition?

Describe the solution you'd like a function can somehow have gRPC to return the confidence immediately (like within 1 second)

Describe alternatives you've considered I tried _recorder.close() and request.close(), and both of them can have gRPC return the confidence immediately, however... (1) _recorder.close(), then I can not record anything because the streamcontroller is closed. (2) request.close(), my audio input is still recognized but flutter keeps returning 'Unhandled Exception: Bad state: Cannot add event after closing'.

felixjunghans commented 4 years ago

Unfortunately this is not possible. Because the Google Speech Api always returns the confidence only at the end of a sentence. So you have to finish a sentence first and leave a pause before you get a confidence back from the Api.

When I try to read the confidence during streaming, Google always returns 0.0 as long as a sentence has not been finished.

tommycyf commented 4 years ago

Thank you for your fast reply.

It seems that only Google Speech API can decide when to return a confidence, unless we stop the audioStreamController or requeststreamController to force API to return the confidence.

If we can avoid 'Unhandled Exception: Bad state: Cannot add event after closing' error after we closed the requestStreamController, the function can be perfectly achieved. (I had this error but device doesn't crash... it works...)

I'm still new to flutter and wondering is it possible to reinit a streamController somehow? It seems that streamController is not remade even I pop the recognizerScreen and push it again...

felixjunghans commented 4 years ago

Hi sorry for the late reply. To work around this error you have to re-create the stream (AudioStream or ResponseStream) again. The problem is that the stream has already been closed and therefore no new data can be added. If you send me a code example where the error occurs I can help you.

tommycyf commented 4 years ago

Thank you for your reply.

I didn't change the code a lot. Just a few change in speech_to_text.dart. I changed "_request" from private to public "request" to use its .close() method when I tap on a button.

If streamingRecognize() is called after speechToText.request.close() is called once. I continuously got stream BadStat error.

This is the speech_to_text.dart I changed. ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓

library flutter_google_speech;
import 'dart:async';
import 'package:google_speech/generated/google/cloud/speech/v1/cloud_speech.pb.dart'
    hide RecognitionConfig, StreamingRecognitionConfig;
import 'package:google_speech/generated/google/cloud/speech/v1/cloud_speech.pbgrpc.dart'
    hide RecognitionConfig, StreamingRecognitionConfig;
import 'package:google_speech/speech_client_authenticator.dart';
import 'package:grpc/grpc.dart';

import 'config/recognition_config_v1.dart';
import 'config/streaming_recognition_config.dart';

class SpeechToText {

// <---------------------declare "request" as a public variable --------------------->
  StreamController<StreamingRecognizeRequest> request; 
// <---------------------declare "request" as a public variable --------------------->

  final CallOptions _options;
  final ClientChannel _channel = ClientChannel('speech.googleapis.com');
  SpeechToText._(this._options);
  factory SpeechToText.viaServiceAccount(ServiceAccount account) =>
      SpeechToText._(account.callOptions);
  Future<RecognizeResponse> recognize(
      RecognitionConfig config, List<int> audio) {
    final client = SpeechClient(_channel, options: _options);
    final recognitionAudio = RecognitionAudio()..content = audio;
    final request = (RecognizeRequest()
      ..config = config.toConfig()
      ..audio = recognitionAudio);
    return client.recognize(request);
  }
  Stream<StreamingRecognizeResponse> streamingRecognize(
      StreamingRecognitionConfig config, Stream<List<int>> audioStream) {
    final client = SpeechClient(_channel, options: _options);

// <---------------------change "_request" to "request" --------------------->
    request = StreamController<StreamingRecognizeRequest>(); "  
// <---------------------change "_request" to "request" --------------------->

    request
        .add(StreamingRecognizeRequest()..streamingConfig = config.toConfig());
    audioStream.listen((audio) {
      request.add(StreamingRecognizeRequest()..audioContent = audio);
    }).onDone(() {
      // Close the request stream, if the audio stream is finished.
      request.close();
    });
    return client.streamingRecognize(request.stream);
  }
}

This is the recognition service script in my app.


import 'dart:async';

import 'package:flutter/services.dart';
import 'package:google_speech/google_speech.dart';
import 'package:sound_stream/sound_stream.dart';

class RecognitionService {
  RecorderStream _recorder = RecorderStream();
  String text = '';
  var confidence;
  var responseStream;
  SpeechToText speechToText;
  var serviceAccount;
  void init() {
    _recorder.initialize();
  }

  streamingRecorder() async {
    await _recorder.start();
    serviceAccount = ServiceAccount.fromString(
        '${(await rootBundle.loadString('assets/test_service_account.json'))}');
    speechToText = SpeechToText.viaServiceAccount(serviceAccount);
    final config = _getConfig();
    responseStream = speechToText.streamingRecognize(
        StreamingRecognitionConfig(config: config, interimResults: true),
        _recorder.audioStream);
    responseStream.listen(
      (data) {
        confidence =
            data.results.map((e) => e.alternatives.first.confidence).toList();
        text = 
            data.results.map((e) => e.alternatives.first.transcript).join('');
      },
      onDone: () async {
      },
    );
  }

// method for stop recognition and get confidence immediately
  stopRecording() {
    speechToText.request.close();      // when the request is closed, we can always get a confidence immediately.
    init();
  }

  RecognitionConfig _getConfig() => RecognitionConfig(
        encoding: AudioEncoding.LINEAR16,
        model: RecognitionModel.basic,
        enableAutomaticPunctuation: true,
        sampleRateHertz: 16000,
        languageCode: 'en-US',
      );
}
felixjunghans commented 4 years ago

Ok I see now where the mistake is. If you close the stream before recording has stopped, data from the microphone will still be transferred to the stream even though the stream has been closed.

see here

  audioStream.listen((audio) {
      request.add(StreamingRecognizeRequest()..audioContent = audio);
    })

Instead of

  stopRecording() {
    speechToText.request.close();      // when the request is closed, we can always get a confidence immediately.
    init();
  }

you should not close the stream but stop recording. If you stop recording the stream will close automatically and you should get a result from the Google Api immediately.

felixjunghans commented 4 years ago

But anyway, if you would rather do it your way, which I do not recommend, you can do it now. I have just released a new version that allows you to cancel the audioStreamSubscription. You just have to call the newly added dispose() method of SpeechToText.

tommycyf commented 3 years ago

I have tried the new dispose() method you released yesterday and it is working perfectly well. Now there is no error at all when I close the request after cancelling the audioStreamSubscription. (I still have to close the requestin streamingRecognize() to get the confidence immediately.) Thank you very much for you help!