exokitxr / exokit

Native VR/AR/XR engine for JavaScript 🦖
MIT License
994 stars 117 forks source link

Trying to use Web Speech API polyfill doesn't work. #477

Open machenmusik opened 6 years ago

machenmusik commented 6 years ago

https://speech-polyfill.azurewebsites.net/

I understand that ExoKit currently doesn't implement the Web Speech API, so tried using a polyfill that uses getUserMedia() to get the audio, which works on Firefox - https://github.com/anteloe/speech-polyfill - but that doesn't seem to work with ExoKit.

avaer commented 6 years ago

It looks like it uses WebRTC.

That would make this blocked on https://github.com/webmixedreality/exokit/issues/218.

machenmusik commented 6 years ago

Looks to me as though it polyfills from old to new gUM API, and uses microsoft-speech-browser-sdk which uses getUserMedia not all of WebRTC...

The SDK depends on WebRTC APIs to get access to the microphone and read the audio stream. Most of todays browsers(Edge/Chrome/Firefox) support this. For more details about supported browsers refer to navigator.getUserMedia#BrowserCompatibility

Note: The SDK currently depends on navigator.getUserMedia API. However this API is in process of being dropped as browsers are moving towards newer MediaDevices.getUserMedia instead. The SDK will add support to the newer API soon.

How does ExoKit currently expose the microphone, is there a simple sample page? I thought it was providing navigator.mediaDevices.getUserMedia for audio, but not for video...

avaer commented 6 years ago

Exokit exposes navigator.mediaDevices.getUserMedia, for video and audio, and this is not related to WebRTC.

WebRTC is on the roadmap, but it's separate. You can put a media stream into WebRTC, but that's not the same thing. That Microsoft speech service appears to require WebRTC, so it won't work until https://github.com/webmixedreality/exokit/pull/346 is merged.

Examples

The home environment has microphone support: https://github.com/webmixedreality/exokit-home/blob/7f1247b45cbf98d370b0a6b3228617c7a3b0ef9f/index.html#L381

And there are video/webcam examples in this repo as well: https://github.com/webmixedreality/exokit/blob/master/examples/webcam.html

It would be nice to get a microphone example in this repo though.

machenmusik commented 6 years ago

I guess I am not understanding where you think the underlying Microsoft SDK requires WebRTC, my understanding is that it sends audio via standard websocket; and to get the microphone, my understanding is that it is doing something similar to your xrmp, meaning getUserMedia then Web Audio API:

https://github.com/Azure-Samples/SpeechToText-WebSockets-Javascript/blob/477067fe264159e7ccbd233a01015f9ea03a6d06/src/common.browser/MicAudioSource.ts

https://github.com/webmixedreality/xrmp/blob/9b86351dab7e79516d232c0917250be9d7e28186/examples/index.html#L1172

Is there a better way to debug more precisely what is failing inside ExoKit?

machenmusik commented 6 years ago

I should mention / ask - does ExoKit prevent use of getUserMedia() unless over https, or does it support localhost exception?

avaer commented 6 years ago

I should mention / ask - does ExoKit prevent use of getUserMedia() unless over https, or does it support localhost exception?

There are no checks. http or even file will have all features.

Is there a better way to debug more precisely what is failing inside ExoKit?

The way I would do it is console.log() to see which part -- if anything -- is not meeting the spec. If there was a stack trace or a repro I could maybe help more.

If the intent is to get the speech polyfill working (without WebRTC -- I can't speak to how it's implemented, it just mentions WebRTC in the readme), then I would probably be logging events inside the speech api lib to bisect what's apparently not working.

avaer commented 6 years ago

(by the way, this would be an awesome contribution if we got it running; having the speech api is probably something we want in the core 💪)

machenmusik commented 6 years ago

agree :-) is there no way to set breakpoints or use remote inspector somehow? that is how I would (did) do with other browsers...

avaer commented 6 years ago

It's just node, you can use any node debug tool -- such as Chrome Devtools, which supports connecting to node.

You could also try ndb support.

machenmusik commented 6 years ago

when I try to have ExoKit allow inspector:


09-15 23:46:58.807  1968     6 I exokit  : Inspector support is not available with this Node.js build
09-15 23:46:58.807  1968     6 I exokit  : node: bad option: --inspect
09-15 23:46:58.807  1968     6 I exokit  : node: bad option: --debug-brk```
avaer commented 6 years ago

Looks like ML1 device? I was assuming with the above it was desktop 😅.

In that case, that is probably a legit bug, the libnode is not built with debugger support. Build repo.

It could be built with debugger, but I think it was disabled for some silly reason.

avaer commented 6 years ago

https://github.com/modulesio/libnode.a/issues/1

machenmusik commented 6 years ago

ignoring the debugger, I see:

09-15 23:53:15.055  2174     6 I exokit  : RtApiDummy: This class provides no functionality.
09-15 23:53:15.055  2174     6 I exokit  : 
09-15 23:53:15.055  2174     6 I exokit  : 
09-15 23:53:15.055  2174     6 I exokit  : RtApi::openStream: output device parameter value is invalid.

so perhaps the microphone isn't hooked up either?

avaer commented 6 years ago

Yeah, it's not hooked up in Magic Leap (it is on desktop). Same for Audio.

Should probably open tracking issues for these.

machenmusik commented 5 years ago

With ML1 (not tested on other platforms) using 0.0.485 or 0.0.486, things are close to working; incremental results are generated, but recognition.stop() is called when a final result is made available, and never returns.

machenmusik commented 5 years ago

See #537

machenmusik commented 5 years ago

0.0.494 still has the issue, need to reopen.

avaer commented 5 years ago
machenmusik commented 5 years ago

Still seems broken in 0.0.516 ?!?

avaer commented 5 years ago

@machenmusik could you provide more details?

machenmusik commented 5 years ago

Sure - speech-polyfill.azurewebsites.net doesn't work, no interim or final transcriptions are seen

avaer commented 5 years ago

That is not an XR site, so it's not expected to boot into anything. However I suspect it's trying to use WebRTC audio channels, which are not hooked in. Previously we were going over WebSocket if I recall.

machenmusik commented 5 years ago

It uses getUserMedia to get the microphone, I think, and not sure that is working as expected. It should work in 2D mode, right? You do see the Just talk prompt appear in 2D...

avaer commented 5 years ago

Yeah the GUM part should be fine, but the part of connecting that to RTC would not be since MediaStream is not hooked to RTC yet.

I think previously there was a fallback to feed that GUM to WebSockets, which was ok, but if it sees RTC it might not be going that route anymore.

Palmer-JC commented 5 years ago

Just curious, would the other direction work, which I only have a test of with form elements? In the test, type in something & click say. I just unplugged my router power supply after page load, and it worked on both Firefox & chrome, so I am pretty sure it is not using a server.

machenmusik commented 5 years ago

Is there an equivalent of chrome:flags to try turning off webrtc? If not what is latest non-rtc build to try?

avaer commented 5 years ago

No, there is not; I believe the last without WebRTC was two releases ago.

machenmusik commented 5 years ago

Ok thx - will try older build a bit later.

machenmusik commented 5 years ago

no luck; exokit 0,0.514 through 0.0.516 share the same behavior, namely (as 2D tab) you see "Just talk" and then no interim or final results ever appear. So appears not to be WebRTC related?

avaer commented 5 years ago

In that case this is most likely a Lumin initialization order refactor. @machenmusik if you're doing custom builds, might be worth bisecting to the commit that broke it. I suspect that has a good chance of making it clear what the fix would be.

machenmusik commented 5 years ago

to be fair, I think that making it work always required some changes (since 0.0.484) and I'm not sure the branch was merged. due to changes required to get older builds running on newer LuminOS, I'm not sure bisection is practical, but I can probably point you at the older changes needed; what I don't recall is whether specific LabSound changes were also required, which would of course complicate things.

avaer commented 5 years ago

We had everything working in master, including the necessary Labsound updates. If you could point to a changeset that worked and one that's broken, we can narrow down what broke it.

machenmusik commented 5 years ago

This is what I am currently using https://github.com/chenzlabs/exokit/commits/mic-hack-perceptionstartup which is 0.0.494 plus one patch that was needed at the time, plus manifest update and the Perception startup change now needed.

Note that version predates 2D support and reality tabs, so you would only see things working via mldb log exokit

machenmusik commented 5 years ago

The oldest release version that renders on current LuminOS is 0.0.504 - and the good news is, if you look at mldb logs, 0.0.504 works with speech-polyfill.azurewebsites.net! However when used with our experience (which I'm sorry we cannot share) liveview is broken, and trying to touch it breaks mic and ultimately all audio.

I will see what the newest release version that works with speech-polyfill.azurewebsites.net is, and hopefully you can help to take it from there... stay tuned.

machenmusik commented 5 years ago

0.0.508 has reality tabs, and when tried in 2D, speech-polyfill.azurewebsites.net throws this...

02-22 16:32:15.898  2111     5 I exokit  : parent got console { jsString:
02-22 16:32:15.898  2111     5 I exokit  :    '\'Unhandled callback error: Error: \'Unhandled callback error: \'Unhandled callback error: Error occurred processing the user media stream. Error: getUserMedia is not implemented in this browser\'\'. InnerError: \'Unhandled callback error: \'Unhandled callback error: Error occurred processing the user media stream. Error: getUserMedia is not implemented in this browser\'\'\'',
02-22 16:32:15.898  2111     5 I exokit  :   scriptUrl: '',
02-22 16:32:15.898  2111     5 I exokit  :   startLine: 0 }

Note that 0.0.506 works although using mldb log exokit since no reality tabs, and 0.0.507 has no corresponding MPK, so hopefully that narrows it down (it may have been broken once reality tabs were introduced)

machenmusik commented 5 years ago

wait, I may have to take it back, 0.0.508 does work in 3D mode using mldb log exokit to see

machenmusik commented 5 years ago

Summary: No versions work in 2D tab; 3D tab works the same as direct mldb URL launch. The last release version that works from mldb launch is 0.0.511; 0.0.512 is broken.

avaer commented 5 years ago

I think we should clarify the 2D/3D divide.

The "2D" mode for reality tabs has almost nothing to do with Exokit -- Exokit forks out to a 2D rendering engine (Servo or Chromium) to draw the page to a texture. It is only intended to draw a texture to the 3D scene, and not intended to work with any media APIs -- which it does not.

Additionally, reality tabs are themselves not released so nothing should be tested against that environment yet. Luckily reality tabs is just an HTML page that happens to load by default -- loading another page at the top level (instead of realitytabs.html) should work on all releases, including current ones, via the command line/mldb.

Hopefully that clarifies things.