aws-amplify / amplify-js

A declarative JavaScript library for application development using cloud services.
https://docs.amplify.aws/lib/q/platform/js
Apache License 2.0
9.43k stars 2.13k forks source link

Transcribe audio to text always returns {"err": "region not configured for transcription"} #10297

Closed raulgonzalezdev closed 1 year ago

raulgonzalezdev commented 2 years ago

Hello, Good Amplify community, it's a pleasure to be on this channel, I am new in the management of Amplify services I am testing the service with Predictions speechGenerator, but when I test the App generated in React-Native in the console I get an Error {"err": "region not configured for transcription"}, I have checked the microphone buffer and certainly generates data, I am testing only with the phrase "Hello Word", I have used the Basic configuration automatically generated by AWS Amplify.

I will leave the fragments of my code, to see if it is a problem with the region where I created the Backend or something in my code that in theory is very simple.

/* eslint-disable */
// WARNING: DO NOT EDIT. This file is automatically generated by AWS Amplify. It will be overwritten.

const awsmobile = {
"aws_project_region": "us-east-1",
"aws_cognito_identity_pool_id": "us-east-1:75f67dbf-//kkk/",
"aws_cognito_region": "us-east-1",
"aws_user_pools_id": "us-east-1_/jjj/",
"aws_user_pools_web_client_id": "/jjjjj************",
"oauth": {},
"aws_cognito_username_attributes": [],
"aws_cognito_social_providers": [],
"aws_cognito_signup_attributes": [
"EMAIL"
],
"aws_cognito_mfa_configuration": "OFF",
"aws_cognito_mfa_types": [
"SMS"
],
"aws_cognito_password_protection_settings": {
"passwordPolicyMinLength": 8,
"passwordPolicyCharacters": []
},
"aws_cognito_verification_mechanisms": [
"EMAIL"
],
"predictions": {
"convert": {
"speechGenerator": {
"region": "us-east-1",
"proxy": false,
"defaults": {
"VoiceId": "Ricardo",
"LanguageCode": "en-US"
}
}
}
}
};

export default awsmobile;`

My Code in React-Native

import React, {useState} from 'react';
import {
View,
Text,
StyleSheet,
TextInput,
TouchableOpacity,
} from 'react-native';
import {Amplify} from 'aws-amplify';
import {
Predictions, AmazonAIPredictionsProvider,
} from '@aws-amplify/predictions';
import awsconfig from '../../src/aws-exports';

import MicStream from 'react-native-microphone-stream';

// from https://github.com/aws-samples/amazon-transcribe-websocket-static/tree/master/lib

Amplify.configure(awsconfig);
Amplify.addPluggable(new AmazonAIPredictionsProvider());
const initialState = {name: '', description: ''};

global.Buffer = global.Buffer

function VoiceCapture() {
const [text, setText] = useState('');

// from https://github.com/aws-samples/amazon-transcribe-websocket-static/tree/master/lib
function pcmEncode(input) {
let offset = 0;
const buffer = new ArrayBuffer(input.length * 2);
const view = new DataView(buffer);
for (let i = 0; i < input.length; i++, offset += 2) {
const s = Math.max(-1, Math.min(1, input[i]));
view.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7fff, true);
}
return buffer;
}

async function transcribe(bytes) {
await Predictions.convert({
transcription: {
source: {
bytes,
},
language: 'en-US',
},
})
.then(({transcription: {fullText}}) => console.log({fullText}))
.catch((err) => console.log({err}));
}

const listener = MicStream.addListener((data) => {
console.log('Data:',data);

// encode the mic input
const pcmEncodedBuffer = pcmEncode(data);

// // add the right JSON headers and structure to the message
// let audioEventMessage = getAudioEventMessage(
//   global.Buffer.from(pcmEncodedBuffer),
// );

// //convert the JSON object + headers into a binary event stream message
// let binary = eventStreamMarshaller.marshall(audioEventMessage);

// the docs say this takes a PCM Audio byte buffer, so i assume the wrappers above aren't necessary. Tried them anyways with no luck.
// (https://docs.amplify.aws/lib/predictions/transcribe/q/platform/js#set-up-the-backend)
transcribe(pcmEncodedBuffer);
});

function startTranscribing() {
MicStream.init({
bufferSize: 4096 * 32, // tried multiplying this buffer size to send longer - still no luck
// sampleRate: 44100,
sampleRate: 16000,
bitsPerChannel: 16,
channelsPerFrame: 1,
});

MicStream.start();
console.log('Started mic stream');
}

function stopTranscribing() {
console.log('Stopped mic stream');
MicStream.stop();
listener.remove();
}

return (

<TouchableOpacity
style={styles.mediumButton}
onPress={() => {
// Voice.start('en_US');
// transcribeAudio();
startTranscribing();
}}>
START
    <TouchableOpacity
      style={styles.mediumButton}
      onPress={() => {
        stopTranscribing();
      }}>
      <Text style={styles.mediumButtonText}>STOP</Text>
    </TouchableOpacity>
  </View>
  <TextInput
    style={styles.editableText}  multiline
    onChangeText={(editedText) => setText(editedText)}>
    {text}
  </TextInput>
</View>
);
}

const NewVoice = () => {
return (

);
};

export const colors = {
primary: '#0049bd',
white: '#ffffff',
};

export const padding = {
sm: 8,
md: 16,
lg: 24,
xl: 32,
};
const styles = StyleSheet.create({
container: {
flex: 1,
backgroundColor: 'white',
padding: padding.lg,
},
bodyText: {
fontSize: 16,
height: 20,
fontWeight: 'normal',
fontStyle: 'normal',
},
mediumButtonText: {
fontSize: 16,
height: 20,
fontWeight: 'normal',
fontStyle: 'normal',
color: colors.white,
},
smallBodyText: {
fontSize: 14,
height: 18,
fontWeight: 'normal',
fontStyle: 'normal',
},
mediumButton: {
alignItems: 'center',
justifyContent: 'center',
width: 132,
height: 48,
padding: padding.md,
margin: 14,
backgroundColor: colors.primary,
fontSize: 20,
fontStyle: 'normal',
elevation: 1,
shadowOffset: {width: 1, height: 1},
shadowOpacity: 0.2,
shadowRadius: 2,
borderRadius: 2,
},
editableText: {
textAlign: 'left',
textAlignVertical: 'top',
borderColor: 'black',
borderWidth: 2,
padding: padding.md,
margin: 14,
flex: 5,
fontSize: 16,
height: 20,
},
horizontalView: {
flex: 1,
flexDirection: 'row',
alignItems: 'stretch',
justifyContent: 'center',
},
});

export default NewVoice;
ykethan commented 2 years ago

hey @gqcryptoraul, thank you for reaching out. from the information provided(aws-exports and app.js) I understand that you are trying to convert text to speech.

async function transcribe(bytes) { await Predictions.convert({ transcription: { source: { bytes, }, language: 'en-US', }, }) .then(({transcription: {fullText}}) => console.log({fullText})) .catch((err) => console.log({err})); }

The API call being used here would need to be the example as follows.

Predictions.convert({
      textToSpeech: {
        source: {
          text: textToGenerateSpeech,
          language: "es-MX" // default configured in aws-exports.js 
        },
        voiceId: "Mia"
      }
    }).then(result => {

      setAudioStream(result.speech.url);
      setResponse(`Generation completed, press play`);
    })
      .catch(err => setResponse(JSON.stringify(err, null, 2)))

Please refer to https://docs.amplify.aws/lib/predictions/text-speech/q/platform/js/#working-with-the-api providing this information. For testing purposes I utilized the Texttospeech function example provided here: https://aws.amazon.com/blogs/mobile/announcing-the-new-predictions-category-in-amplify-framework/

In my tests i observed the error "region not configured for transcription occurs when we use a different api from the category configured. Please do let us know if this aligns with your use case.

raulgonzalezdev commented 2 years ago

@ykethan Thanks for your input, but what I need to do is to Transcribe text from audio. using (speechGenerator) So I am also using a library that recognizes the voice and stores it in the buffer and then with the Amplify framework I can write the interpreted text in the temporary voice recording.

As you have a lot of experience I would like if you could send me a link to be used in React-Native.

This is the prediction that I want to use and that I have configured in my local project.

"predictions": { "convert": { "speechGenerator": { "region": "us-east-2", "proxy": { "proxy": false, "defaults": { "voiceId": "joey", "languageCode": "en-US" } } } }

ykethan commented 2 years ago

hey, I will be transferring the issue to amplify js for better assistance on your use case using react native.

additionally, amplify does allow us to convert text from audio when we provide the following options.

amplify add predictions
? Please select from one of the categories below: Convert
? What would you like to convert?
  Translate text into a different language
  Generate speech audio from text
> Transcribe text from audio

observed the aws-exports file with the following.

"predictions": {
        "convert": {
            "transcription": {
                "region": "us-east-1",
                "proxy": false,
                "defaults": {
                    "language": "en-GB"
                }
            }
        }
    }
stocaaro commented 1 year ago

Hello @raulgonzalezdev ,

From the comments above, it looks like your app was configured with speechGenerator and needs to be configured for transcription before it will work without the error you're seeing. Have you made this update to your application as @ykethan recommended? Did that resolve the error?

If not, what are you seeing and what does your configuration look like now that this transcription is configured?

Thanks, Aaron