Open nathander opened 4 years ago
@cedricgrothues thoughts on this? Is this related to what you were looking at?
@ashika01 – You're right, this is related to the docs issue. I'll add it to the contribution board and look into it.
instead got only empty strings back.
@nathander – Are you getting any error message? I get a promise rejection with the following error: "The requested language doesn't support the specified sample rate. Use the correct sample rate then try again." (using 16000 as the sampleRate
).
@cedricgrothues -- It's not throwing an error. I'm just getting back an empty string.
I'm guessing my audio buffer is formatted incorrectly
@nathander – Quick update: you're right, the audio buffer is formatted incorrectly, but that's an issue with the data stream from the react-native-microphone-stream
library. I'm looking into finding a fix for this issue and updating the documentation to avoid any further confusion.
Thanks for the update, @cedricgrothues. Is there another package you'd recommend for pulling the microphone stream?
@nathander, the only format transcribe streaming API support as of now is PCM which means the input somehow needs to be formatted that way. There doesn't seem to be any reliable stream, I could find (@cedricgrothues could you comment on this). Mean while I have reached out to AWS Transcribe team for some guidance.
Sorry for the late response, @nathander. While we wait for a response from the AWS Transcribe team, react-native-pcm-audio
might be worth a look (disclaimer: I haven't tested the library myself, it only supports android, and was last updated in late 2017).
Hi @cedricgrothues - thanks, unfortunately I need it to work on iOS. Any updates from the Transcribe team?
hi @nathander - just brought this up with the team. At this time, we are working with the service team to support more format types. We will mark this as a feature request as we would follow-up with UI components to better support this.
Hi @mauerbac - I don't think I'm asking for support for more format types, I'm just trying to figure out how to generate an accepted format type in React Native. Is streaming the mic buffer to Amplify Predictions from React Native currently supported? If so, is there any code you could share?
@nathander The library you are using is the same one we are using for our docs work. We went deep diving into this issue of PCM buffer while writing our docs and while this issue was opened. There are trying couple of methods like finding a good library we could use to get it working and fork the library code and made some changes ourself but no good way as of now. We will be looking deeper into this for updating docs. But we feel best solution might be, transcribe team to open up other audio format for ease of use from mobile side.
@ashika01 I appreciate the update -- thanks a lot for looking into this!
Similar issue. In Expo / react-native / Android How would you specify .mp4 as the encoding / file type for AWS Transcribe, via the javascript amplify prediction class (instead of PCM)?
https://docs.amplify.aws/lib/predictions/transcribe/q/platform/js#working-with-the-api
Expo on android supports .mp4 AAC and not PCM
According to the Transcribe docs, Transcribe supports FLAC, .mp3, .mp4 and .wav
https://docs.aws.amazon.com/transcribe/latest/dg/input.html
It wasn't clear where the docs are that show all of the available {"transcription" : ... } options
I have an Expo application that successfully records form the microphone, and saves it locally as an .m4a file (on Android) It would be easiest if you could take a local .mp4, .m4a, or .wav file (and if needed, ingest into a buffer) and send with a config stating the file type, or even better yet, have amplify react-native the ability to transcribe direct from a local file (read and buffer behind the scenes)
expo snippet
import { Audio } from 'expo-av';
const recordingOptions = {
android: {
extension: '.m4a',
outputFormat: Audio.RECORDING_OPTION_ANDROID_OUTPUT_FORMAT_MPEG_4,
audioEncoder: Audio.RECORDING_OPTION_ANDROID_AUDIO_ENCODER_AAC,
sampleRate: 44100,
numberOfChannels: 2,
bitRate: 128000,
},
ios: {
extension: '.wav',
audioQuality: Audio.RECORDING_OPTION_IOS_AUDIO_QUALITY_HIGH,
sampleRate: 44100,
numberOfChannels: 1,
bitRate: 128000,
linearPCMBitDepth: 16,
linearPCMIsBigEndian: false,
linearPCMIsFloat: false,
},
};
etc...
@jeffsteinmetz – Amplify uses the Transcribe Streaming
web socket API instead of Transcribe
's REST API, and that, as of now, only supports PCM.
https://github.com/aws-amplify/amplify-js/blob/abf8e824308c229e09e585a7995d17a51f36c652/packages/predictions/src/Providers/AmazonAIConvertPredictionsProvider.ts#L398
For reference, the Transcribe Streaming
docs
@cedricgrothues Ahh! Gotcha. Looking at https://github.com/aws-amplify/amplify-js/blob/abf8e824308c229e09e585a7995d17a51f36c652/packages/predictions/src/Providers/AmazonAIConvertPredictionsProvider.ts#L401
These docs
https://docs.amplify.aws/lib/predictions/transcribe/q/platform/js#set-up-the-backend
don't make mention of the format or sample-rate it expects, so that may cause some confusion with devs using Predictions.convert
It will throw Source types other than byte source are not supported.
if you send it a javascript object containing just the raw binary data. Its not clear what it is expecting (or what Type as it relates the Typescript).
Looking at the test https://github.com/aws-amplify/amplify-js/blob/abf8e824308c229e09e585a7995d17a51f36c652/packages/predictions/__tests__/Providers/AWSAIConvertPredictionsProvider-unit-test.ts#L114
It appears to expect something like source: { bytes: ...
Predictions.convert({
transcription: {
source: {
bytes: new Buffer([0, 1, 2])
},
// language: "en-US", // other options are "en-GB", "fr-FR", "fr-CA", "es-US"
}
})
An example of how to call it with a javascript object would be beneficial.
(The test also appears also to use a type of "Buffer" from a lib which isn't imported by default.)
It also throws an error (note: I do not reference Buffer
in my code).
[Unhandled promise rejection: ReferenceError: Can't find variable: Buffer]
Stack trace:
http://192.168.1.105:19001/node_modules/expo/AppEntry.bundle?platform=ios&dev=true&minify=false&hot=false:175114:68 in <unknown>
node_modules/promise/setimmediate/core.js:45:6 in tryCallTwo
node_modules/promise/setimmediate/core.js:200:22 in doResolve
node_modules/promise/setimmediate/core.js:66:11 in Promise
node_modules/@aws-amplify/predictions/lib-esm/Providers/AmazonAIConvertPredictionsProvider.js:270:8 in
@jeffsteinmetz – You're right, the docs are still missing a Predictions.convert example, but as far as I know, that's because there is currently no library that reliably supports either streaming or recording a PCM buffer from react-native.
@ashika01's comment sounds promising, though:
But we feel best solution might be, transcribe team to open up other audio format for ease of use from mobile side.
@nathander Were you able to reliably use a streaming or PCM buffer package on react native that worked well for you?
@ashika01 did you later use a library that worked well for your docs? I am working on a similar problem statement and am stuck on finding a good mic audio buffer streaming library. Any recommendations?
Thanks in advance :)
@meherranjan I ended up abandoning React Native for my project and building in Java and Swift instead.
Hi team Same issue for me, Any update?
Describe the bug When I send my microphone buffer to Transcribe using Predictions.convert, I only get empty strings back. I'm not sure whether this should be a bug report or feature request. I'm guessing my audio buffer is formatted incorrectly for Predictions.convert, but the docs don't give enough information to verify that. This may be related to this open issue: https://github.com/aws-amplify/amplify-js/issues/4163
To Reproduce Steps to reproduce the behavior:
Expected behavior Expected a transcription of the spoken text to return -- instead got only empty strings back.
Code Snippet My App.js is here: