Open MikeAlhayek opened 4 months ago
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @robch.
@robch here is how the recording is captured using JavaScript and send to the SignalR hub to call the service above
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
const mediaRecorder = new MediaRecorder(stream, {
mimeType: "audio/ogg; codecs=opus",
});
const subject = new signalR.Subject();
mediaRecorder.addEventListener("dataavailable", async e => {
// convert blob to base64 to send to the SignalR as string
const uint8Array = new Uint8Array(await e.data.arrayBuffer());
const binaryString = uint8Array.reduce((str, byte) => str + String.fromCharCode(byte), '');
var base64 = btoa(binaryString);
subject.next(base64);
});
// When the recording stops, complete the request for the SignalR hub so that the StopContinuousRecognitionAsync is called.
mediaRecorder.addEventListener("stop", () => {
subject.complete();
});
// when the recording starts, send the subject to the SignalR hub/server
mediaRecorder.addEventListener("start", () => {
connection.send('UploadStream', sessionId, currentRecordingId, subject);
});
// start recording when the record button is clicked
recordButton.addEventListener("click", () => {
if (mediaRecorder.state == "recording") {
mediaRecorder.stop();
} else {
mediaRecorder.start(1000);
}
});
}).catch(err => {
// If the user denies permission to record audio, then display an error.
console.log('Error: ' + err);
alert('You must allow Microphone access to use this feature.');
});
Alternatively, I tried to use RecognizeOnceAsync()
instead of continuous recognizer as you can see in code below. The request times out every time.
public async Task<string> GetTextAsync(Stream stream, AudioInterpreterTextContext context = null)
{
ArgumentNullException.ThrowIfNull(stream);
stream.Position = 0;
byte[] bytes = null;
if (stream is not MemoryStream memoryStream)
{
memoryStream = new MemoryStream();
await stream.CopyToAsync(memoryStream);
bytes = memoryStream.ToArray();
memoryStream.Dispose();
}
bytes ??= memoryStream.ToArray();
var index = FindHeaderEndIndex(bytes);
if (index > -1)
{
stream.Position = index + 1;
}
else
{
stream.Position = 0;
}
var format = AudioStreamFormat.GetCompressedFormat(AudioStreamContainerFormat.OGG_OPUS);
using var audioStream = AudioInputStream.CreatePushStream(format);
// Do we have to write the bytes in chunks here? Not sure why can't we do audioStream.Write(bytes) instead of the next 14 lines.
using (var binaryReader = new BinaryReader(stream, Encoding.UTF8, leaveOpen: true))
{
byte[] readBytes;
do
{
readBytes = binaryReader.ReadBytes(_bufferSize);
if (readBytes.Length == 0)
{
break;
}
audioStream.Write(readBytes, readBytes.Length);
} while (readBytes.Length > 0);
}
var speechConfig = SpeechConfig.FromSubscription(_options.Key, _options.Region);
if (context != null && context.Data.TryGetValue("Language", out var lang))
{
speechConfig.SpeechRecognitionLanguage = lang?.ToString();
}
using var audioConfig = AudioConfig.FromStreamInput(audioStream);
using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
if (speechRecognitionResult.Reason == ResultReason.RecognizedSpeech)
{
return speechRecognitionResult.Text;
}
LogErrors(speechRecognitionResult);
return null;
}
Here is the debug trace for the above approach
024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[491387]: 211125ms SPX_TRACE_ERROR: base_gstreamer.cpp:211 Error from GStreamer: Source: oggdemux
Message: Could not demultiplex stream.
DebugInfo: ../ext/ogg/gstoggdemux.c(4776): gst_ogg_demux_send_event (): /GstPipeline:pipeline/GstOggDemux:oggdemux:
EOS before finding a chain
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_TRACE_INFO: blocking_read_write_buffer.h:127 WaitUntilBytesAvailable: available=0; required=3200 writeZero=true ...
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[491387]: 211125ms SPX_TRACE_SCOPE_EXIT: base_gstreamer.cpp:186 BaseGstreamer::HandleGstMessageError
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_TRACE_ERROR: create_object_helpers.h:21 site does not support ISpxObjectFactory
2024-05-15 08:14:35.1589||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 211125ms SPX_THROW_HR: create_object_helpers.h:22 hr = 0x14
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212461ms SPX_TRACE_ERROR: exception.cpp:123 About to throw Exception with an error code: 0x14 (SPXERR_UNEXPECTED_CREATE_OBJECT_FAILURE)
[CALL STACK BEGIN]
> audio_config_get_audio_processing_options
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- pal_string_to_wstring
- configthreadlocale
- BaseThreadInitThunk
- RtlUserThreadStart
[CALL STACK END]
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_ENTER: thread_service.cpp:45 CSpxThreadService::Term
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_EXIT: thread_service.cpp:45 CSpxThreadService::Term
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_DBG_TRACE_SCOPE_EXIT: audio_pump.cpp:173 *** AudioPump THREAD stopped! ***
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_TRACE_ERROR: audio_pump.cpp:472 [0000025BF2822E90]CSpxAudioPump::PumpThread(): exception caught during pumping, Exception with an error code: 0x14 (SPXERR_UNEXPECTED_CREATE_OBJECT_FAILURE)
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_TRACE_ERROR: create_object_helpers.h:21 site does not support ISpxObjectFactory
2024-05-15 08:14:36.5001||||Microsoft.CognitiveServices.Speech|DEBUG|[187895]: 212462ms SPX_THROW_HR: create_object_helpers.h:22 hr = 0x14
I changed the recorder settings for testing purpuses by passing the following object to the navigator.mediaDevices.getUserMedia
object.
{
audio: {
autoGainControl: false,
channelCount: 1,
echoCancellation: false,
latency: 0,
noiseSuppression: false,
sampleSize: 16
}, video: false
}
I am able to write the bites to a local file and I am able to play the file with no problem. Here are the metadata info for the saved file
file_type: OPUS
file_type_extension: opus
mime_type: audio/ogg
opus_version: 1
audio_channels: 1
sample_rate: 16000
output_gain: 1
codec_name: opus
codec_long_name: Opus (Opus Interactive Audio Codec)
sample_rate: 48000
channels: 1
channel_layout: mono
duration: 4.76
size: 8648
bit_rate: 14534
Still with no success into converting the audio bytes to test using the SDK.
I provided a repo for this issue https://github.com/Azure-Samples/cognitive-services-speech-sdk/pull/2387 . Also, this repo can provide a good sample once I get it to work.
Type of issue
Code doesn't work
Description
I am trying to use a web-browser to record audio, transmit it using SignalR to the server, and use continuous speech recognition to transform the audio to text. However, I am running into errors while trying to use recognizer.
Here is my class that calls the recognizer
The SignalR hub that is calling this class is as follow
But I keep getting the following error
Here is a dump of the trace stack
Page URL
https://learn.microsoft.com/en-us/dotnet/api/microsoft.cognitiveservices.speech?view=azure-dotnet
Content source URL
https://github.com/Azure/azure-docs-sdk-dotnet/blob/master/xml/ns-Microsoft.CognitiveServices.Speech.xml
Document Version Independent Id
87dd50dc-6504-1ab9-2e94-0d784fc4a563
Article author
@azure-sdk
Metadata