Microphone audio is horribly mangled at low framerates

happysmash27 commented 2 years ago

Context

In ChilloutVR my audio is horribly mangled at low framerates as detailed in the following issue forum post: https://forums.abinteractive.net/d/671-microphone-audio-is-horribly-mangled-at-low-framerates

This is an extremely crippling issue and has prompted me to try to fix it myself, by making a mod for it, due to the urgency of it. ChilloutVR is extremely dependent on voice chat and it is also extremely common for me to get sub-15 fps framerates in VR, resulting in it ruining the experience extremely consistently.

Upon analysing the issue and CVR code in order to make said mod, it it appears that the voice chat code is actually Dissonance, rather than something ABI made themselves, so I am reporting it here in hopes that it can be fixed in the underlying code, and/or perhaps to find out if there is any way I can configure it to be more tolerant of lower framerates without having to actually do a deep dive into the decompiled code and reprogram it myself.

Expected behaviour

Audio works at all framerates; I do not sound like a broken microwave, even if I am getting 5 fps.

Actual behaviour

Audio does not work at all for a very large portion of the time! I sound like a broken microwave to others despite my audio being completely clear on my end: https://forums.abinteractive.net/d/671-microphone-audio-is-horribly-mangled-at-low-framerates/

I get many log messages along the lines of:

[Dissonance:Recording] (04:38:30.044) BasePreprocessingPipeline: Lost 1920 samples in the preprocessor (buffer full), injecting silence to compensate
(Filename: C:\buildslave\unity\build\Runtime/Export/Debug/Debug.bindings.h Line: 39)

And between some messages like this I get the following:

[Dissonance:Recording] (04:38:30.044) BasePreprocessingPipeline: Lost 1920 samples in the preprocessor (buffer full), injecting silence to compensate
(Filename: C:\buildslave\unity\build\Runtime/Export/Debug/Debug.bindings.h Line: 39)

[Dissonance:Network] (04:38:30.395) EventQueue: Large number of received packets pending dispatch (12). Possibly due to long frame times (last frame was 104ms)
(Filename: C:\buildslave\unity\build\Runtime/Export/Debug/Debug.bindings.h Line: 39)

[Dissonance:Recording] (04:38:30.396) BasePreprocessingPipeline: Lost 960 samples in the preprocessor (buffer full), injecting silence to compensate
(Filename: C:\buildslave\unity\build\Runtime/Export/Debug/Debug.bindings.h Line: 39)

Which appears to indicate that this is indeed a failure as a result of Dissonance not dealing well with high frame times.

Workaround

Theoretically I could modify the code with MelonLoader to have a bigger buffer? Annoyingly though it looks like Dissonance.Audio.Capture::BasicMicrophoneCapture (not sure if I am writing that correctly or not; I am not too familiar with syntax for classes as I usually program in C) does not take any options to configure that. So, maybe change the code to inject something that makes the buffer bigger? Or maybe there is some other potential method I don't know about; I am not familiar with Dissonance and have only just figured out where the code for the microphone component of the voice chat is.

Fix

Ideally the code would be multithreaded and not be tied to the frame rate in the first place so that audio still works even if frozen for a few seconds. That would be absolutely amazing in a category of software (social VR) where such freezes can happen pretty frequently, to be uninterrupted by things that are unrelated to the voice capture. Voice chat should not be bottlenecked by my GPU!

But for a faster temporary solution, maybe the buffer could be made a little bit bigger? I'm not 100% sure this is the issue but from the error messages and decompiled DLL it looks like this could potentially fix it, at least for most situations.

Steps to Reproduce

Start microphone capture.
Go into such an environment, that the framerate is really low (under 15 fps; at some points I was having this issue I was getting... 3 fps, which... wow, that's quite a bit lower than I realised).

Your Environment

Dissonance version used: Unknown as I don't have the source code, but from analysis of DissonanceVoip.dll I think it might be v4.0.30319? Or maybe it's 8.0.3. From the logs: [Dissonance:Core] (04:30:31.547) DissonanceComms: Starting Dissonance Voice Comms (8.0.3)
Unity version: 2019.4.28f1
Editor Operating System and version: CVR was probably originally built on Microsoft Windows, and I am running on Gentoo Linux.
Build Settings: Unknown, but probably built on Windows. Is definitely built for Windows; on Linux I am running this through Proton.
Link to project: http://chilloutvr.de/what-is-cvr.php

happysmash27 commented 2 years ago

It appears that the audio is sent when the Update() method of DissonanceComms is called. Maybe if it was called more than once per frame the audio would not glitch? So to workaround, or maybe even fix this, one would need to find a way to invoke the function more than once per frame.

Researching how to do this, I think FixedUpdate() might potentially work better than Update() for this, but I am not 100% sure.

Alternatively, maybe it could be run on a completely separate clock on a separate thread, so that heavy physics do not slow it down either. I am unsure if heavy physics are able to slow down FixedUpdate() or not.

happysmash27 commented 2 years ago

Upon further research, I believe FixedUpdate() only runs after each Update() call, however many times it needs to keep up with how many updates occurred at the same time, so would not work for this purpose.

happysmash27 commented 2 years ago

I think adding a C# thread that continuously updates every so often (maybe half the expected latency, e.g, 10ms?) and then having a mutex for each DissonanceComms object, that needs to be locked every time something accesses something that is accessed by the audio thread, could potentially work, but have not figured out enough about what the update function accesses, and when the other threads may access it, to know of this solution would be ideal or not or how easy/hard it would be to implement it.

martindevans commented 2 years ago

Hey, we generally only provide support directly to our customers (in this case, the Chillout VR devs), If you can draw the attention of the CVR devs to this issue that'd be great. Since you're looking at making a mod I'll be happy to provide you with some technical details about how Dissonance works though :)

To explain some of the log messages you're seeing:

Lost 1920 samples in the preprocessor (buffer full), injecting silence to compensate

Audio data is passed from the main thread (recording the mic) to the preprocessor thread (processing and encoding audio) through a buffer which is large enough to contain 16 frames of audio (it depends on other settings, but that's probably around half a second). If you can change it easily (Assets/Plugins/Dissonance/Core/Audio/Capture/BasePreprocessingPipeline.cs, line 99) feel free to increase the size of that buffer. This buffer should constantly be drained by an independent thread which is processing the audio as fast as it can, so if this buffer is filling up it means one of two things:

The CPU is overloaded, so the preprocessor thread cannot run fast enough. There are other log messages which would probably show up if this was the case - check if you see a message like this anywhere: "Preprocessor running slow! Iteration took:XXms for YY frames"
The main thread is bundling up audio into one big blob (over half a second of audio) and dumping it onto the preprocessor thread, overflowing the buffer.

EventQueue: Large number of received packets pending dispatch (12). Possibly due to long frame times (last frame was 104ms)

In the network system packets arrive and are processed into a queue of events which are then dispatched to other systems next time Update is called. This message indicates that a very large number of events (12) were put into the queue within a single frame, probably because the last frame took 104ms (i.e. 10fps). The exact details of how the network system works depends on exactly what underlying network framework CVR is using (which I guess you don't know?). This isn't necessarily a problem (there are buffers to handle large numbers of packets etc) but it's usually indicative of other issues, if this is causing an issue you would hear received voices breaking up.

I think adding a C# thread that continuously updates every so often (maybe half the expected latency, e.g, 10ms?)

Unfortunately this (and your various other suggestions) won't work for one simple reason: the Unity Microphone class (which we use by default) requires running on the main thread. Dissonance already runs as much other work as possible in background threads (preprocessing/encoding runs in our own thread, decoding and playback run inside the Unity audio engine thread).

It is possible to use a different microphone system which provides lower latency using our FMOD microphone integration package which basically just replaces the default Unity Microphone class with a much better version. I doubt that's something that can be done by mods though.

having a mutex for each DissonanceComms object, that needs to be locked every time something accesses something that is accessed by the audio thread

Just a note about this idea specifically, if you're going to experiment with things yourself: don't do this. Locks on the audio thread are extremely bad news, locks in audio code should be very rare and any locks which exist must be uncontended in 99.9% of situations!

For example you might have noticed that the preprocessor locks the _inputWriteLock object, but that lock is only ever acquired by one thread (i.e. no contention ever) and in theory isn't really needed - it's just a defence against a badly built mic system (we support custom mics) calling it on two threads and breaking everything.

Placeholder-Software / Dissonance