godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.33k stars 21.24k forks source link

AudioEffectCapture with surround sound audio artifacts #91133

Open BuzzLord opened 7 months ago

BuzzLord commented 7 months ago

Tested versions

4.2.2-stable and 4.2.1-stable, but likely all others since the relevant code hasn't changed in years.

System information

Godot v4.2.2.stable - Windows 10.0.19045 - Vulkan (Forward+) - dedicated NVIDIA GeForce RTX 2080 Ti (NVIDIA; 31.0.15.4633) - Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz (8 Threads) - WASAPI using IAudioClient3

Issue description

AudioEffectCapture with a Surround speaker mode (i.e. more than one stereo channel) causes audio artifacts.

The AudioServer's _mix_step() function processes bus effects by iterating over each active channel. For each channel, it calls the process() function of each effect, passing the channel's buffer as an argument.

The AudioEffectCapture::process function processes the source buffer by copying it to the destination buffer and then, if there is available space in the capture ring buffer, copies the source buffer into it.

What this means for surround audio setups is that after the first 512 (i.e. the hard coded buffer_size) stereo channel frames are copied into the capture buffer, the extra 3.1, 5.1, 7.1 channels will get copied in afterwards. In my 7.1 setup, the higher channels are muted during the _mix_step_for_channel() (channel_vol is 0.0), so my capture buffer ends up with 512 good audio frames, then 512*3 zeroed audio frames, which results in horrible popping/crackling artifacts in the audio. Even if the surround channels were not muted, the issue would still occur because the capture buffer would still be filled with consecutive frames from different channels, causing artifacts.

This happens regardless of the AudioStream type (sound files, mic, generator/procedural).

A potential solution to this issue could be to modify the AudioEffectInstance::process function to accept the channel index as an additional parameter. This would allow each effect to handle the source frames differently based on the channel index. Capture would probably only want the first channel, or maybe a weighted average of all channels. Other effects could just ignore the channel index to continue behaving the same as they do now.

A drawback of this solution is that it would require effects to be aware of the meaning of each channel index, which could couple the concept of channel indices across different effects (i.e. channel 0 is regular stereo left/right, channel 1 is surround 3.1 LFE and center, channel 2 is 5.1 rear left/right, etc.). Right now all channels seem to be independent, treated the same by most of the audio code. There may be another, better way.

Steps to reproduce

The most important aspect needed to test/reproduce the issue is a surround sound audio setup (i.e. more than just a standard stereo channel pair). The Dummy audio driver seems like it can simulate surround speaker modes, but setting speaker mode isn't accessible from GDScript or the editor, as far as I know, so I couldn't set it up in the MRP below.

Assuming a surround sound speaker mode, configure an audio bus with a Capture effect on it. Add an AudioStreamPlayer with a sound file or some other stream (the MRP uses a Generator populated with a sine wave), set the bus to Capture.

Attach a script that reads frames from the capture effects buffer 512 frames at a time. Each batch of 512 frames should come from one channel in the audio bus. The first batch would be the main stereo sound (i.e. the sound we probably want), while the next few will be either empty (all zero) or possibly other surround channels, sequentially packed into the capture buffer.

The resulting frames can be inspected for the above pattern, or passed on to a Generator for output to hear the artifacts directly.

Minimal reproduction project (MRP)

CaptureAudioIssue.zip

BuzzLord commented 5 months ago

I dug into this further, and found that the root cause of this issue seems to be that some AudioEffectInstances aren't really designed for multiple channels.

During initialization (or when the bus layout changes), the AudioServer instantiates an AudioEffectInstance for each channel in turn. That means for systems with surround sound setups that have more than one channel, each AudioEffect will have multiple AudioEffectInstances created from it.

For effects like reverb or filter, this works fine because each channel maintains its own buffer and processing is independent. However, some effects don't work correctly with multiple channels.

AudioEffectCaptureInstance stores a pointer to its parent AudioEffectCapture, which is used to access the effects internal buffer. When there are multiple channels, each instance writes to the same buffer. This causes issues when processing multiple channels because they're all appending to the same buffer (as described above).

AudioEffectRecord also has issues due to how it handles instances. It maintains a single current_instance pointer that is overwritten by each channel on the bus during calls to instantiate(). As a result, only the last channel on the bus actually records. This may be the reason some users have issues recording from a mic (#33184, #81950, and #69755).

I have a potential solution that I'll make a PR for shortly.