Audio latency is higher than it needs to be

ivanstepanovftw commented 1 month ago

Bevy version

main aaccbe88aa0d591c9c741f690ab472785c7bac09

[Optional] Relevant system information

Fedora Linux 40 (Workstation Edition)

$ uname -a
Linux fedora 6.8.11-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon May 27 14:53:33 UTC 2024 x86_64 GNU/Linux

What you did

Basic audio playing on Space press or Mouse1 release:

use bevy::prelude::*;

fn main() {
    App::new()
        .add_plugins(DefaultPlugins)
        .add_systems(Update, signal)
        .run();
}

fn signal(
    mut commands: Commands,
    asset_server: Res<AssetServer>,
    keyboard_input: Res<ButtonInput<KeyCode>>,
    mouse_button_input: Res<ButtonInput<MouseButton>>,
) {
    if keyboard_input.just_pressed(KeyCode::Space) || mouse_button_input.just_released(MouseButton::Left) {
        commands.spawn((
            AudioBundle {
                source: asset_server.load("sounds/breakout_collision.ogg"),
                settings: PlaybackSettings::DESPAWN
            },
        ));
    }
}

Then loopback desktop audio to microphone.

Pressed record button in Audacity. Measure latency in VLC program from click to sound by selecting range in Audacity:

Then latency can be then calculated from selection view below in Audacity: $.498-.463=.035$ s.

Repeat for Bevy example:

$.639-.495=.144$ s.

What went wrong

I have expected that latency in Bevy is lower than in VLC audio player.
Instead, latency is 4 times higher in Bevy, than in VLC ($144/35=4.1$).

Additional information

Tried changing ogg to wav with no improvement.
Tried --release flag with no improvement.

Tried to increase poll resolution:

app
  .insert_resource(Time::<Fixed>::from_hz(600.0))
  .add_systems(FixedUpdate, signal)

No improvement.

Tried to preload audio to resource with no improvement.
Tried Kira with preloading with no improvement.

alice-i-cecile commented 1 month ago

You almost certainly want to preload small sound effects into the asset storage, rather than loading them on demand as you did here. That's likely to be the major source of latency here (and something our examples should more clearly teach).

That said, once that's done, can you try this with bevy_kira_audio and let us know how it compares? We're considering swapping backends, and improvements here would be

ivanstepanovftw commented 1 month ago

Thank you for rapid response :)

Tried to preload audio into resource:

use bevy::prelude::*;

fn main() {
    App::new()
        .add_plugins(DefaultPlugins)
        .add_systems(Startup, setup)
        .add_systems(Update, signal)
        .run();
}

#[derive(Resource)]
struct SFX {
    collision_sound: Handle<AudioSource>,
}

fn setup(
    mut commands: Commands,
    asset_server: Res<AssetServer>,
) {
    commands.insert_resource(SFX {
        collision_sound: asset_server.load("sounds/breakout_collision.ogg"),
    });
}

fn signal(
    mut commands: Commands,
    keyboard_input: Res<ButtonInput<KeyCode>>,
    mouse_button_input: Res<ButtonInput<MouseButton>>,
    sfx: Res<SFX>,
) {
    if keyboard_input.just_pressed(KeyCode::Space) || mouse_button_input.just_released(MouseButton::Left) {
        commands.spawn((
            AudioBundle {
                source: sfx.collision_sound.clone(),
                settings: PlaybackSettings::DESPAWN
            },
        ));
    }
}

Cycle shown here is mouse press, then release, then observe for 2 clicks in VLC and 2 clicks in Bevy, so left part is VLC 2 clicks and right part is Bevy. Selected first click in Bevy with $597-484=113$ ms latency: So, no improvement for preloading sound. Though I am not sure I've done it correctly (can I reference sound instead?).

Will check Kira in 10 minutes.

alice-i-cecile commented 1 month ago

Okay thanks, that's a much more realistic setup. The code you've supplied looks correct to me.

Very weird to see that this didn't help and to see it be delayed substantially more than a frame. I'll ask around (hi @inodentry?) to get opinions from people with more expertise.

ivanstepanovftw commented 1 month ago

Kira example:

bevy = { version = "0.13.2", features = ["dynamic_linking", "mp3", "wav"] }
bevy_kira_audio = "0.19.0"

use bevy_kira_audio::prelude::*;
use bevy::prelude::*;
use bevy_kira_audio::AudioSource;

fn main() {
    App::new()
        .add_plugins((DefaultPlugins, AudioPlugin))
        .add_systems(Startup, setup)
        .add_systems(Update, signal)
        .run();
}

#[derive(Resource)]
struct SFX {
    collision_sound: Handle<AudioSource>,
}

fn setup(
    mut commands: Commands,
    asset_server: Res<AssetServer>,
) {
    commands.insert_resource(SFX {
        collision_sound: asset_server.load("sounds/breakout_collision.ogg"),
    });
}

fn signal(
    keyboard_input: Res<ButtonInput<KeyCode>>,
    mouse_button_input: Res<ButtonInput<MouseButton>>,
    audio: Res<Audio>,
    sfx: Res<SFX>,
) {
    if keyboard_input.just_pressed(KeyCode::Space) || mouse_button_input.just_released(MouseButton::Left) {
        audio.play(sfx.collision_sound.clone());
    }
}

Had to rollback Bevy to 0.13.2 Latency $117$ and $105$ ms

alice-i-cecile commented 1 month ago

@ivanstepanovftw thank you very much for investigating this. I'm personally out of immediately actionable ideas, but for the sake of posterity what OS are you on?

ivanstepanovftw commented 1 month ago

It is Fedora Linux 40 (Workstation Edition)

$ uname -a
Linux fedora 6.8.11-300.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Mon May 27 14:53:33 UTC 2024 x86_64 GNU/Linux

SolarLiner commented 1 month ago

System latency is defined by 3 things: the input latency, the audio processing latency, and the output latency.

The input latency is due to every single part of the transport process from the moment you physically closed a switch on the mouse integrated circuit to the moment the main thread forwards the play start event. Because you're testing both on the same system and with the same mouse (connected the same way) for fairness, the important factor here for input latency is how short is the duration between when the event gets dispatched into Bevy and the moment the right system reads the event. VLC goes not have a "game loop" and can directly dispatch events without having to wait. VLC is also highly-optimized software by the sheer nature of its age and usage (and I assume amount of contributions) so it would not surprise me if there was a custom routine to get it to react faster by doing some extra computer wizardry, and could in theory explain the 4x increase in latency.

The audio processing latency is mainly defined in terms of how much data the audio device requests the computer to process. Audio data is requested in chunks, at once, in a single callback, and so the length of the buffer determines how frequent the audio processing callback gets called, and with that, the inherent latency of the system (worst case scenario, you tell the audio engine to play right as it finishes processing a chunk of audio, and you have to wait the entire time until the next time the audio processing callback is called before you can hear the results of your change. This is directly determined by the buffer size and the sample rate, and your latency is directly determined because of that. VLC could be using a smaller buffer size and a higher sample rate than Bevy, which could by itself explain the 4x difference.

The output latency is determined by the inherent latency of your audio interface (but is constant in both runs, so shouldn't matter), and the latency of the native OS API you're hooking into (on Windows, for example, you have a choice of 3 (4 if you count ASIO) native APIs, all with various drawbacks and states of decay, and, of course, all different latency amounts; but not to worry, Linux is also kind of a mess with 3 (4 if you want to distinguish JACK and PipeWire)) active APIs that are all available to not break existing programs. This means that VLC could be choosing the right combination of API and OS settings, while Bevy takes whatever defaults comes to it, and aren't necessarily the best, and could very well explain the 4x increase in latency.

And my guess as to what's happening here? It's all of the above. Bevy doesn't have any heuristics in choosing an audio stream configuration, it takes the default it is given and uses it as-is. This could be changed, either by exposing some way of letting users choose their own configurations, or by integrating heuristics that reduce output latency (and needs both, IMHO). It's also the case that Bevy runs its systems only once per graphical frame, and so at 60 fps you'll have 16 ms of worst-case latency just between receiving the event in the main thread, and your system telling the audio engine to start playing. This, too, can be solved by implementing callback- or observer-based APIs to react to events as fast as possible instead.

All of this is assuming all audio data was ready to be used, and hardware setup was the same for both runs.

ivanstepanovftw commented 1 month ago

Tested more game engines for audio latency:

PyGame: 21, 15, 20, 16 ms LatencyPygame.zip (CPU%: 100)

SDL2 (chunk size 4096): 79, 78, 74, 99, 86, 100 ms LatencySDL2.zip (CPU%: 100)

SDL2 (chunk size 512 (as in PyGame)): 25, 21, 17 ms LatencySDL2 (512).zip (CPU%: 100)

Unreal Engine 5 (Editor): 112, 115, 114 ms [3.8 GiB project...]

Unity: 50, 95, 99, 116, 101, 41, 113 ms LatencyUnity.zip

Unity (set Project Setting | Audio | DSP Buffer Size to Best Latency): 45, 45, 46 ms

Godot 4 (Audio | Device | Output Latency is 15, Editor): 59, 55 ms LatencyGodot.zip

Godot 4 (Audio | Device | Output Latency is 15, Linux\x11, no debug): 52, 45 ms See above

Godot 4 (Audio | Device | Output Latency is 1, Editor): 21, 35, 23, 34, 23, 22 ms See above

macroquad: 150, 149 ms https://github.com/not-fl3/macroquad/blob/858f1108002bd5b858d43d6a3b5111236203c1b6/examples/audio.rs

notan: 90, 100, 107, 84 ms https://github.com/Nazariglez/notan/blob/a6ca3afdd5877658fd3f4daa50afaf4ba4933f31/examples/audio_basic.rs

raylib: 48, 55, 54 ms https://github.com/raysan5/raylib/blob/dcf2f6a8e97911c90efce5722bd7f0c7cdc8601e/examples/audio/audio_sound_multi.c

And in games: osu!lazer: 55, 54 ms https://github.com/ppy/osu

Apps: VLC: 35 ms

Chrome: 75, 79, 81 ms https://music.youtube.com/

Bevy: 113 ms

alice-i-cecile commented 1 month ago

I'd be really curious to see those numbers for Unity/Unreal/Godot as well. This is an extremely informative investigation: I'd love to have a way to measure this in an automatable way.

ivanstepanovftw commented 1 month ago

I have added much more benchmarks to the previous message. I have discovered, that PyGame have lowest audio latency, faster than VLC baseline. Interesting!

PyGame uses SDL2 with 512 chunk size in Mix_OpenAudio.

I have got 14, 16 ms with chunk size 1.

ivanstepanovftw commented 4 weeks ago

Bevy + SDL2 Mixer = 38 ms, with chunk size 256. With chunk size 1 it is 19 ms.

use bevy::prelude::*;
use sdl2::mixer::{InitFlag, AUDIO_S16LSB, DEFAULT_CHANNELS};

pub const MIXER_CHUNKSIZE: i32 = 256;

fn main() {
    // Initialize SDL2 and SDL2_mixer
    let sdl_context = sdl2::init().unwrap();
    let _audio_subsystem = sdl_context.audio().unwrap();
    sdl2::mixer::open_audio(44100, AUDIO_S16LSB, DEFAULT_CHANNELS, MIXER_CHUNKSIZE).unwrap();
    sdl2::mixer::init(InitFlag::OGG).unwrap();
    sdl2::mixer::allocate_channels(2);

    // Load sound effect
    let sound = sdl2::mixer::Chunk::from_file("assets/sounds/breakout_collision.ogg").unwrap();

    App::new()
        .add_plugins(DefaultPlugins)
        .insert_non_send_resource(SdlAudio {
            sound
        })
        .add_systems(Update, signal)
        .run();
}

struct SdlAudio {
    sound: sdl2::mixer::Chunk,
}

fn signal(
    keyboard_input: Res<ButtonInput<KeyCode>>,
    mouse_button_input: Res<ButtonInput<MouseButton>>,
    sdl_audio: NonSend<SdlAudio>,
) {
    if keyboard_input.just_pressed(KeyCode::Space) || mouse_button_input.just_released(MouseButton::Left) {
        sdl2::mixer::Channel::all().play(&sdl_audio.sound, 0).unwrap();
    }
}

alice-i-cecile commented 4 weeks ago

Okay, so this implies that the majority of our latency is coming from the Rust audio stack, not any of a Bevy's architecture choices, correct?

Not what I would have expected: thank you for measuring this.

ivanstepanovftw commented 3 weeks ago

I have tried to specify buffer size manually, but unfortunately got 150 ms latency:

./crates/bevy_audio/src/audio_output.rs:

impl Default for AudioOutput {
    fn default() -> Self {
        let Some(default_device) = cpal::default_host().default_output_device() else {
            warn!("No default audio device found.");
            return Self {
                stream_handle: None,
            };
        };

        let default_config = default_device.default_output_config().unwrap();
        let default_config = SupportedStreamConfig::new(
            default_config.channels(),
            default_config.sample_rate(),
            cpal::SupportedBufferSize::Range {
                min: 1,
                max: 1,
            },
            default_config.sample_format()
        );
        let default_stream = OutputStream::try_from_device_config(&default_device, default_config);
        if let Ok((stream, stream_handle)) = default_stream {
            // We leak `OutputStream` to prevent the audio from stopping.
            std::mem::forget(stream);
            Self {
                stream_handle: Some(stream_handle),
            }
        } else {
            warn!("No audio device found.");
            return Self {
                stream_handle: None,
            };
        }
    }
}

ivanstepanovftw commented 3 weeks ago

Just tested Godot again, but setting Project Settings | Audio | Device | Output Latency to 1, got 21, 35, 23, 34, 23, 22 ms.

bevyengine / bevy