Very delayed voice using UWP and HLapi

awonnink commented 6 years ago

Context

Our app works both on Windows desktop as wel as UWP (Mixed Reality). Dissonance works well on desktop, having only a small and acceptable delay. But when (the client player is) compiled for UWP the delay immediately becomes very large (sometimes like 10 seconds ore more) and unstable.

Expected Behavior

Of course on all platforms good response using the same server

Actual Behavior

See above.

Workaround

none

Fix

please do ;)

Steps to Reproduce

Provide a detailed set of steps to reproduce the problem

We use a x64 windows desktop Unity app with dissonance as server, Unity Network Manager started as server
Start two x64 desktop player with Unity NW manager started as client. This works 3.Compile one of the players as UWP app. Now both the players voices do not respond within reasonable time, and even stop responding

Your Environment

**Dissonance 6.2.2
**Unity version: 2017.4.6f1
** Microsoft Windows 10 Pro 10.0.10586 (x64)
Build Settings: Windows, x86_64 and UWP

martindevans commented 6 years ago

Could you send me a log from both of the clients involved in the non-working session with the settings all set to Debug (Windows > Dissonance > Diagnostic Settings).

awonnink commented 6 years ago

[diss.zip](https://github.com/Placeholder-Software/Dissonance/files/2177750/diss.zip

Thank you for your quick response. I have started the desktop as separate client, the uwp version in the Editor

awonnink commented 6 years ago

Did you have the chance to look at the logs yet? Any suggestion what we can try?

martindevans commented 6 years ago

Not yet sorry, I was prebooked to be doing other things the last two days. I'm going to have a look at them now :)

awonnink commented 6 years ago

No problem at all, just curious. Our app starts muting the mic. The app is a kind of 3d webbrowser, and the URL is used as chamber. With both instances I go to the same webpage. Somewhere at the end of the UWP log I enable the mic and start talking. Several seconds later I hear myself from the desktop instance.

martindevans commented 6 years ago

The Editor_uwp_version log looks pretty much perfect, there are a couple of pipeline resets caused by long frame times (probably a hitch when some other content is loaded etc) but nothing that could cause trouble.

The output_log_desktopversion is a complete disaster! Before you even join a session there are two resets caused by long frame times. After you join the session everything totally falls apart: the preprocessor buffer constantly overflows, the microphone pipeline is reset due to bad conditions, the encoded audio buffer gets far too large (buffering up 83 packets or 4130ms of audio, so here's the cause of the delay), eventually the microphone also starts overflowing it's buffers and losing audio.

So the problem is obvious, however there is no indication of what the root cause is. All of these things look like artefacts of dreadful performance across the board - not just on the Unity main thread but also the background preprocessing thread and the background audio thread.

I'm a little confused by this since you say that running two of these desktop clients is perfectly fine and the problem only manifests when one of them is running UWP. However there are two CapturePipelineManager: Detected a frame skip messages in the log before it's even connected to a network session, so it doesn't seem like the other client could possibly be causing those.

What kind of performance do you get in the desktop client?
Does this same problem happen if you reverse the clients (UWP standalone, x64 in Editor)? If so could you capture a pair of logs from that so that I can get more detailed debug info from the x64 side.

awonnink commented 6 years ago

diss.zip Hi, Since the update is available I have used 6.2.3 (but not yet on the server). Here are 3 tests

Desktop-DeskTop in editor
Desktop - UWP in editor
Desktop in Editor - UWP

Only the first works ok. The others delay, or the sound seems even deformed.

The second is similar in what I send earlier. For some reason the log seems cleaner to me now.

Add.: The deformed sound seems to be only in debug mode from Visual Studio. In release mode it seems I am getting a better response, but parts of what I say is not transmitted (I have updated the server now also to 6.2.3).

martindevans commented 6 years ago

This looks like it's caused by poor performance again, for example one of these logs (uwp_dt/UnityPlayer_uwp) shows:

(15:00:00.351) BasicMicrophoneCapture: Insufficient buffer space, requested 32097, clamped to 16383 (dropping samples) (15:00:03.051) BasicMicrophoneCapture: Insufficient buffer space, requested 148068, clamped to 16383 (dropping samples)

These are the microphone trying to read data and discarding most of it because there's too much data. Dissonance reads all of the microphone data every frame - so the second message means that from one frame to the next 148,068 samples of data were sampled by the microphone, that's 3.08 seconds of audio (which is approximately the difference in timestamps between these logs).

Do you have very long frame times? Perhaps just a few very long frames during loading?

awonnink commented 6 years ago

I am sending you a Unity test project over wetransfer with how I have implemented Dissonance at the client. Although it perform better than within our app, I notice here also that when I continue to speak in the UWP client, the sound starts to fail. I am not sure if you have the equipment to reproduce it (I use the MR HMD and portal), but it might show some things I'm implementing the wrong way.

I'll check the frames.

awonnink commented 6 years ago

The framerate usually stays above 60fps. When loading a new website it goes to 15 fps for a short time. It doesn't seem to get really out of range when the delay in sound response starts to happen.

martindevans commented 6 years ago

The framerate usually stays above 60fps

It's more to do with the worst case frame time. If a single frame is 500ms, and then the rest are all 9ms that's 60fps but that one frame will have some bad side effects.

Anyway I had a look through the project and ran it a few times in editor. I don't see anything setup incorrectly on the Dissonance side of things. The first frame or two looks to be a bit long as it's running all kinds of setup stuff but this is expected and Dissonance should handle that (almost every application Dissonance is used in has this kind of first frame spike). General performance after this point was fine (70+fps), obviously that's running on a powerful PC and not a MR headset but I don't think it would be low enough to cause the kind of problems we're seeing.

I manually unmuted myself (it looks like a script is muting me, I assume that's intentional?) and spoke a little, I only had one client connected so I couldn't hear it but all the stats indicated that it was running absolutely perfectly sending voice.

My suspicion at the moment is that a single bad frame causes the capture system to lose it's mind and resetting it will fix it. Could you try adding in a button which calls DissonanceComms:ResetMicrophoneCapture, when the problem occurs hit that button and see if it fixes it - if it does that will give me some good information to narrow down the source some more.

awonnink commented 6 years ago

Thanks for checking the test app. The white box is programmed to be the on/of switch for speaking, so that's why I mute it on start. On desktop it can be clicked with the mouse, all the other gameobjects objects are to allow clicking it with a MR controller. The ResetMicrophoneCapture helps. But then after talking for some time (say in the order of10 seconds) the sound stops responding again, and I have to reuse the button that triggers this. It feels like the problem is more likely to start when there is a silence after speaking for some time.

martindevans commented 6 years ago

Ok that's interesting, that narrows it down a fair bit. Could you try the same again but with a Push-To-Talk button instead of using voice activation?

awonnink commented 6 years ago

Hey! With Push to talk it seems to work fine!

martindevans commented 6 years ago

Aha! I encountered a VAD related voice quality issue the other day. I've logged it on our internal tracker and I'll be working on that next week I expect, I'll keep you informed :)

I guess the best workaround might be to use open voice (I hesitate to recommend open voice, because it can be a pretty horrible experience for other listeners).

awonnink commented 6 years ago

I am waiting for support from MS for another issue, so I think I can wait for another week to publish a new release of our app. Thanks for the help so far!

martindevans commented 6 years ago

I must admit I was dreading working on this because the VAD is a very complex bit of C++ code and that's always a nightmare to debug/modify. However, I think I've found the issue and it's actually nothing to do with the VAD itself (yay).

The problem seems to have been caused by the faders on the broadcast trigger, they smoothly fade speech out over a short period when you stop talking. The faders start fading in/out when you start/stop talking - however if you stopped talking (start fading out) and then started talking before it hit zero it would not initiate a fade in, it would just stay at the current value. This is particularly exacerbated by VAD which tends to start and stop talking very very quickly.

If you find the big if statement around line 286 of VoiceBroadcastTrigger.cs you can change it to:

//Apply state if changed
if (current != next)
{
    if (current)
    {
        //Begin fade out (if it's not already fading to zero)
        if (Math.Abs(_activationFader.EndVolume) > float.Epsilon)
            _activationFader.FadeTo(0, (float)_activationFaderSettings.FadeOut.TotalSeconds);

        //Stop transmitting once fade out is complete
        if (CurrentFaderVolume <= float.Epsilon)
            CloseChannel();
    }
    else
    {
        //Start transmitting
        OpenChannel();
    }
}
else if (current)
{
    //If we're speaking and the activation fader is not going to the max volume yet, start fading in
    if (Math.Abs(_activationFader.EndVolume - _activationFaderSettings.Volume) > float.Epsilon)
        _activationFader.FadeTo(1, (float)_activationFaderSettings.FadeIn.TotalSeconds);
}

This monitors the fader all the time you are speaking, and initiates a fade in if one hasn't already been started.

awonnink commented 6 years ago

An short test indicates this is indeed the solution. I will do some more testing.

awonnink commented 6 years ago

Although it works much better now, without long silences or very delayed sound, unfortunately there are still sometimes a few words skipped.

martindevans commented 6 years ago

Can you watch closely on the DissonanceComms component at runtime - when there's a missed word does the channel close briefly? If so I would guess this is an issue with the VAD not being quite sensitive enough.

awonnink commented 6 years ago

diss Not sure where to look at. Neither the room, nor something in channel is shown. Apart from the Global room, shouldn't at least the room (in my case it is the Url of the visiting website) be visible, or is that only displayed when configured in the VoiceReceiptTrigger?

Anyway, this time I wasn't able to reproduce the missing words, and everything worked as expected. Not sure what is different now.

martindevans commented 6 years ago

Oops. I forgot that the UI only shows the channel remote players are speaking through, local channels aren't shown anywhere. I might add that to the inspector for the next release.

Keep me informed if any issues come up again.

awonnink commented 6 years ago

Will do. Thanks the great support again!

martindevans commented 6 years ago

Dissonance 6.2.4 went live on the asset store yesterday. That includes the fix to the fader system and should resolve this issue.

Placeholder-Software / Dissonance