Recognizing long audio files

mrn59 commented 4 years ago

Hi @ricalo , I am running into https://cloud.google.com/speech-to-text/docs/error-messages#sync_input_too_long error during speech-to-text transcription for >1 min audio input. I am trying to follow the guidelines for async (LongRunningRecognize https://cloud.google.com/speech-to-text/docs/async-recognize#transcribing_long_audio_files_using_a_google_cloud_storage_file) and implement in the method private void translateAudioMessage here https://github.com/ricalo/firebase-android-client/blob/master/app/src/main/java/com/google/cloud/solutions/flexenv/PlayActivity.java#L336. However, I am failing to correctly implement the same and looking for guidance here.

ricalo commented 4 years ago

Hi @mrn59

This sample uses a Cloud Function to send the message to Speech-to-Text on Google Cloud as explained in Adding speech translation to your Android app. To support async transcription, you should update callSpeechToText in the Cloud Function to use longRunningRecognize() instead of just recognize().

It looks to me like the Android client wouldn't need any significant updates to support what you're looking for. Let me know how it goes.

mrn59 commented 4 years ago

Thanks @ricalo for looking into this! I tried simply updating callSpeechToText to use longRunningRecognize() instead of recognize(), but I keep running into these two follow-up errors:

(1) "{ Error: 3 INVALID_ARGUMENT: Inline audio exceeds duration limit. Please use a GCS URI. at Object.callErrorFromStatus (/srv/node_modules/@grpc/grpc-js/build/src/call.js:30:26) at Http2CallStream.call.on (/srv/node_modules/@grpc/grpc-js/build/src/client.js:96:33) at emitOne (events.js:121:20) at Http2CallStream.emit (events.js:211:7) at process.nextTick (/srv/node_modules/@grpc/grpc-js/build/src/call-stream.js:100:22) at _combinedTickCallback (internal/process/next_tick.js:132:7) at process._tickDomainCallback (internal/process/next_tick.js:219:9) code: 3, details: 'Inline audio exceeds duration limit. Please use a GCS URI.', metadata: Metadata { internalRepr: Map {}, options: {} } }"

(2) "TypeError: Cannot read property 'map' of undefined at exports.speechTranslate.functions.https.onRequest (/srv/index.js:73:10) at "

Any idea why?

ricalo commented 4 years ago

Ahh, yes, the first error is because the longRunningRecognize() takes an Uri to the audio file in a Google Cloud Storage (GCS) bucket instead of the raw content that recognize() accepts. The callSpeechToText() function would require the following change before you can use longRunningRecognize():

audio: {uri: 'gs://my-bucket/audio.raw'}

The Android client would require a significant change after all. The client has to upload the source audio file to a GCS bucket instead of sending the raw content to the Cloud Function. :disappointed:

Edit: Provided more details about the update required in the Cloud Function.

ominfowave commented 4 years ago

Hi @ricalo , I just implemented longRunningRecognize(); method on Cloud Function instead of recognize() and also updated Android client to upload audio file on GCS and then sending this url to the Cloud Function.

But now i am getting bellow error from the Cloud Function. TypeError: speechToTextClient.longRunningRecognize is not a function or its return value is not iterable at callSpeechToText (/srv/index.js:170:41) at exports.speechTranslate.functions.https.onRequest (/srv/index.js:64:35) at <anonymous> at process._tickDomainCallback (internal/process/next_tick.js:229:7)

I there any other modification required on index.js file from the sample? Bellow is the modification i did on callSpeechToText() function.

audioContent Contains GCS url passed through Android client.

` const callSpeechToText = ( audioContent, encoding, sampleRateHertz, languageCode ) => { const request = { config: { encoding: encoding, sampleRateHertz: sampleRateHertz, languageCode: languageCode, }, audio: {uri: audioContent}, };

const [operation] = speechToTextClient.longRunningRecognize(request);
return operation.promise();

}; `

ricalo commented 4 years ago

Hi @ominfowave,

The only thing I can think of at the moment is to check if you have enabled the API at https://console.cloud.google.com/apis/library/speech.googleapis.com

I'm shooting in the dark otherwise because using the long operation API is not in the scope of this sample. This sample is part of the Google Cloud documentation at https://cloud.google.com/solutions/mobile/speech-translation-android-microservice

Could you click the Feedback button on that page and request an update or another tutorial that covers the long running operation?

mrn59 commented 4 years ago

Hi @ricalo,

I'm working with @ominfowave on this piece and I checked, our API is enabled. I'll write the Feedback on this page to request an update - Let's see where it takes us! Tnx!!

ricalo commented 4 years ago

Thanks, @mrn59

GoogleCloudPlatform / firebase-android-client

Recognizing long audio files #29