JamesBrill / react-speech-recognition

💬Speech recognition for your React app
https://webspeechrecognition.com/
MIT License
645 stars 116 forks source link

Echo cancellation? #124

Closed peterzanetti closed 2 years ago

peterzanetti commented 2 years ago

When using this in React, the microphone is picking up all sound that comes from computer speakers and interpreting it as input for voice to text. Is there no echo cancellation? Or any way to achieve echo cancellation through polyfills?

peterzanetti commented 2 years ago

Also what about microphone source selection? I don’t see any methods for this?

peterzanetti commented 2 years ago

Hello?

JamesBrill commented 2 years ago

Hi @peterzanetti Echo cancellation and microphone source selection are not features that are available in the Web Speech API, so this library cannot expose them. This library is intended to just be a wrapper for the Web Speech API and does not perform any audio processing itself.

However, they might be possible with polyfills, which are built on more powerful browser APIs (e.g. MediaDevices) and use cloud-based speech recognition APIs that you have more control over (i.e. you're paying for your own instances rather than using the browser's default cloud API, the configuration of which is opaque). The polyfill with the most potential for both of these features is the Azure one - if you raise an issue on its repo, the author might be able to offer you advice on enabling them. I see Azure does offer some kind of echo cancellation, though I don't know enough about it to say whether it's possible to consume that feature from the polyfill - its SDKs don't seem suitable for web. You may need to host your own proxy service for processing an audio stream from the browser before passing it on to Azure.

Bear in mind that speech recognition on web is still relatively immature - I wouldn't expect to see any advanced audio processing features being offered natively for some time, if ever.

peterzanetti commented 2 years ago

Thanks for the suggestions. It is a little disturbing that any web API that supports microphone access itself eschews something as rudimentary at microphone device handling.

When looking at Speechly itself (not the polyfill), they seem to have these things worked out. Their demo supports echo cancellation, and there is some reference to a microphone component for React. Seems like a direct integration with Speechly might make sense.

JamesBrill commented 2 years ago

I imagine Google would argue that they're already providing access to their speech recognition servers for free in Chrome (for better or worse - they've given themselves an unfair advantage over browser vendors using Chromium, where they've disabled this functionality...). Indeed, Google had a huge influence on the W3C specification for this API in the first place - probably adding compute-intensive speech processing features to the spec wouldn't have been cost-effective for them.

While I wouldn't couple this library directly to any one vendor, I can definitely see the value in enhancing the Speechly polyfill to allow configuration of more advanced features, though I don't know if echo cancellation is configurable with Speechly. That said, their polyfill might already give you what you need out of the box if echo cancellation is applied by default. Or, as you mention, they do have a decent set of React components for you to play with.

I hope you find what you're looking for!

peterzanetti commented 2 years ago

FYI after doing more research on this, I found that (for Chrome at least) actual acoustic echo cancellation is only done as part of MediaStreamTrack, or more commonly when used on a webRTC call. Just getUserMedia() itself is not enough to get this into effect, even though getUserMedia supports audio constraints like echoCancellation and noiseSupression. They don't seem to actually do anything unless its part of a call. Or rather, what they do is not true acoustic echo cancellation (listening for and cancelling any ambient sound that is originating from a device speaker), which Chrome only seems to apply in uses of MediaStreamTrack.