Audio Filters and Digital Signal Processors

harudagondi commented 2 years ago

What problem does this solve or what need does it fill?

Imagine you are in a water level, but still above ground. The music plays normally.

Then you go underwater, thus the music becomes all muddied up.

Next, you jump out of the water, and the music returns to normal.

How do you implement that in Bevy?

Godot has audio buses.
Unreal Engine 5 has MetaSounds.
Unity has audio effects.

What solution would you like?

An AudioFilter trait, with a single method filter(&mut self, input: &[Sample], output: &mut [Sample]), where Sample is simply a type that implements rodio::Sample.

Boxed audio filters are stored in Audio, in order, and each can be toggled programmatically. These effects are global (unless #5832 is implemented, then it is local to each AudioListener).

What alternative(s) have you considered?

Implement an audio filter using Decodable and pass the audio source to it. Similar to iterator combinators or fundsp's audio units combinators. Very unergonomic when manually implemented.

Additional context

N/A

DevinLeamy commented 2 years ago

Hi @harudagondi, I'm interested in this issue and https://github.com/harudagondi/bevy_oddio/issues/31 and have some questions.

What is the reasoning behind having filter take in input and output slices, as opposed to something like filter(&mut self, input: &Sample) -> Sample?

Based on what you said, I imagined an API that looks something like

impl AudioFilter for AmplifyFilter {
    fn filter(&self, sample: &Sample) -> Sample {
        --snip--
    }
}

impl AudioFilter for DistortionFilter {
    fn filter(&self, sample: &Sample) -> Sample {
        --snip--
    }
}

fn setup(audio: ResMut<Audio>, ...) {
    let amplify = AmplifyFilter::new();
    let distort = DistortionFilter::new();

    audio.insert_filter("Amplify", amplify);
    audio.insert_filter("Distort", distort);

    --snip--

    audio.play(music);
}

fn toggle_filter_system(audio: ResMut<Audio>, ...) {

    --snip--

    if (toggle_distortion) {
        audio.toggle_filter("Distort");
        /*
             .enable_filter(...);
             .disable_filter(...);
             .remove_filter(...);
        */
    }
}

How does this design look? And, did you have any idea as to how the user should define filters for samples? My initial idea was to have something similar to rodios decorator pattern like Sources, but for Samples. Something like sample.amplify(...).fade_in(...). I'd like to get your thoughts.

If there's anything else I should now about this issue, lmk!

harudagondi commented 2 years ago

What is the reasoning behind having filter take in input and output slices, as opposed to something like filter(&mut self, input: &Sample) -> Sample?

This is honestly just arbitrary. Kira's Sound returns a Frame, while oddio's Signal take in mutable output reference. I don't know the exact reasoning for the API.

I agree with the toggle system, but for the setup, I imagine a node based system with one input and output. This allows for more flexibility (See fundsp's Net32/64, which is a DAG). But for a minimal viable product, this would suffice, although I would like to see a way to easily reorder these filters.

And, did you have any idea as to how the user should define filters for samples? My initial idea was to have something similar to rodios decorator pattern like Sources, but for Samples. Something like sample.amplify(...).fade_in(...). I'd like to get your thoughts.

I'm not sure I understand. Can you elaborate further on this?

harudagondi commented 2 years ago

Also, if you'd like for us to informally discuss this, you could say hi at the #audio-dev channel at the bevy discord 😄

DevinLeamy commented 2 years ago

(See fundsp's Net32/64, which is a DAG)

This is awesome and something like this for DSP and filtering would be awesome!

harudagondi commented 2 years ago

By the way, there are talks in bevy development that there would be a possibility to have a bevy_graph crate similar to petgraph. We could utilize that for our audio node system.

We could also adopt how shaders work in rendering, however I have no experience in that field.

adsick commented 2 years ago

just want to let you know, there is https://github.com/WeirdConstructor/synfx-dsp-jit a dsp crate that is using cranelift to jit compile an ast for efficiency.

adsick commented 2 years ago

also want to throw a couple of pennies here:

1) think of naming, AudioFilter may be called AudioNode, AudioEffect, AudioProcessor etc. "filter()" could be "apply()", "process()" etc.

2) think of controlling this filters/fx whatever - you wanna have ability to control things like volume, mix (dry/wet), cutoff for filters, mix, reverb amount, size, delay length etc. these parameters will be different for each type of filter/fx

harudagondi commented 2 years ago

just want to let you know, there is WeirdConstructor/synfx-dsp-jit a dsp crate that is using cranelift to jit compile an ast for efficiency.

This is nice, however this is licensed as GPL-3.0-or-later. Also, this issue aims to find a way to introduce an interface for audio filters, not implementing them directly. However, that is an interesting project 😊.

harudagondi commented 2 years ago

think of naming, AudioFilter may be called AudioNode, AudioEffect, AudioProcessor etc. "filter()" could be "apply()", "process()" etc.

Yeah I agree with this.

think of controlling this filters/fx whatever - you wanna have ability to control things like volume, mix (dry/wet), cutoff for filters, mix, reverb amount, size, delay length etc. these parameters will be different for each type of filter/fx

Yeah this is also one of the things we would like to have. A related issue is #5828, of which we could apply to this one.

adsick commented 2 years ago

This is nice, however this is licensed as GPL-3.0-or-later. Also, this issue aims to find a way to introduce an interface for audio filters, not implementing them directly. However, that is an interesting project 😊.

yep, I've paid attention to that too. I did not meant we could use it but rather teach from it.

SolarLiner commented 1 year ago

Some thoughts based on what I know about signal processing, DSP and real-time audio filtering:

Unless JIT is used, dynamic dispatch is going to incur overhead when working through the processing graph. This overhead can be amortized by passing slices of audio data to nodes, instead of running the graph for every sample. On the other hand, VCV Rack, for example, does exactly that and can run pretty complex patches. fundsp also works that way. Outside of performance, both approaches have upsides and downsides. Per-sample processes are in general easier to write, and graph traversal is also easier (delay lines, biquad filters); however there are classes of solutions that perform much better with block-based processes (FFT convolution). Multichannel processing is trivially parallelizable per-channel - especially stereo processing fits in a f32x2 or f64x2.
Real-time DSP banishes any code that could lead to a context switch. This means no allocation, no locking mechanism or I/O on the audio thread. All frameworks and libraries that touch on real-time DSP are written with that in mind (cf. JUCE or iPlug in C++, or nih-plug or baseplug as an example for Rust) define audio processes as taking a single mutable slice of multichannel audio samples (usually interleaved as this is what comes from device callbacks) and processing them in-place. This allows buffer allocations to be done beforehand, parametrized by the selected buffer size, and not run any dynamic allocation during processing.
The audio graph can be similarly compiled as the render graph, where intermediate audio buffers can be pre-allocated and the graph traversal compiled into a multithreaded schedule, which allows arbitrary non-cyclic topologies.
Real-time rendering a frame is strictly bound by the target framerate of the application or game (usually 16.7 ms for 60 Hz), real-time audio processing needs to conform to a strict 5.3 ms budget (256 sample blocks @ 48 kHz) - you can increase the block size at the cost of increased latency. 15 ms is a hard limit for live music performance, and more generally 20 ms is a good threshold for non live audio applications (which excludes rhythm games). This means that all other latencies being null, we can increase the block size up to 2048. Other sources of latency includes digital-analog converters, speakers being an arm-length away, Windows audio drivers...
Using f64 for audio processing is great for marketing, but f32 already has ~730 dB of headroom, and has a signal to noise ratio of ~120 dB, which has always worked perfectly even for the nastiest of screamer pedals simulations. It's also twice as fast for every operation, which adds up quickly when tight on budget, especially considering that we're not just trying to process audio, but also run an entire game engine on the side.
Relatedly, using integer samples or fixed point samples is very much a relic of the past, from days of specialized DSP routines and DSP microprocessors - floating point operations are fast enough that the optimization of running anything else than floating point samples is not worth the trouble.

We can take existing audio buffer implementations (the audio crate seems to have the feature set required to get something going, but it hasn't seen any releases in a year), or we can build our own abstraction for audio buffers, which isn't the most taxing thing (basically a ndarray-like buffer structure with transposing from interleaved to sequential data layouts).

As for the general direction of getting audio processing into bevy:

pub trait AudioNode {
  fn process(&mut self, context: &AudioContext, data: &mut AudioBuffer) -> ProcessStatus;
}

pub struct AudioBuffer {
  data: Vec<f32>,
  /// Buffer length in samples, number of channels can be inferred as `data.len() / length`
  length: usize,
}

pub struct AudioContext {
  buffer_start_samplecount: u64,
  samplerate: f32,
}

pub enum ProcessStatus {
  Failed,
  KeepAlive,
  Tail(u64 /* number of samples after which the audio tail of the process drops below noise floor */),
}

Develop an audio graph similar to (but simpler than) the render graph, using an f32 audio buffer
"Compile" the graph into a linear schedule with pre-allocated intermediate buffers
Integrate the compiled schedule into the current audio pipeline

That's all that came across my head, as my 2cc on this discussion.

harudagondi commented 1 year ago

@SolarLiner have you checked out bevy_fundsp? Also I plan on integrating knyst into bevy for 0.9 =)

bevyengine / bevy