Duality audio upgrade - Githubissues

AdamsLair / duality

a 2D Game Development Framework

https://adamslair.github.io/duality

MIT License

1.41k stars 290 forks source link

Duality audio upgrade #636

Open ChristianGreiner opened 6 years ago

ChristianGreiner commented 6 years ago

Here are some ideas (brainstorming) how we could improve the audio features of Duality.

1. Support Multiple SoundListeners at Once See #482

2. Adjust the volume via Gizmo Set the min. / max. distance of the audio emitter via scene editor like scaling an object:

3. Cone Angles

the cone angles (inner and outer) specify how "sharp" the direction is. if inner and outer are 180, the sound is the same from all directions. if inner is 45, and outer is like 60, you'd have the full audio when the direction points at the listener within 45°, no (or minimum) audio at above 60°, and fading between both inbetween those angles by Adam

4. Audio Effects Add audio effects to a sound emitter (or sound instance ?)

Low Pass
High Pass
Echo
Distortion
Reverb

5. Audio Areas Define audio areas (like shape of rigidbody ?) so the audio emitter gets the same volume everywhere in this area.

So do u have further proposals and ideas?

mfep commented 6 years ago

Had a look into OpenTK regards the audio system. Points 3 and 4 can be solved and implemented with ease in the current system. Let's see what they have:

OpenTK has support for cone angles for audio sources. This functionality can be exposed to AudioInstance and AudioEmitter as well.
There seems to be placeholders for highpass and bandpass filters, but according to the comments, these are not implemented. Perhaps a newer OpenTK/OpelAL version has them.

There are a number of effects supported by OpenTK. The full list seems to be:

reverb
chorus
distortion
echo
flanger
frequency shifter
vocal morpher (a formant filter I guess)
pitch shifter
ring modulator
autowah
compressor
equalizer

From this list, the ones @ChristianGreiner mentioned are more useful for gamedev purposes. On the front end, I'd propose a struct owned by the SoundEmitter and SoundInstance objects along the following lines (horrible pseudocode). Obviously any of the sub-structs being null'd mean that the particular effect is off.

struct AudioEffectSettings {
  struct ReverbSettings = null;
  struct EchoSettings = { DelayTime = 5.0f, /* other parameters of echo */ }
  // other effect types
}

There is yet another consideration: if my understanding is correct, OpenTK can only apply these effects/filters on its internal sound sources. At the moment each of the Duality sound sources are mapped to a OpenTK sound source. If #482 was implemented somehow, these per-source effects wouldn't be available anymore.

mfep commented 6 years ago

Here is a proposal of a possible course of implementation:

Idea 3. (cone angles) and 4. (effects) can be implemented in an additive manner (not breaking compatibility) exposing the corresponding OpenTk functionality. As described above.
Idea 2. (volume gizmo) is more of a cosmetic/editor feature, and thus quite a separate business. It would make a nice standalone Good-First-Issue entry as well.
Idea 1. (multiple sound listeners) and 5. (audio area) are more advanced audio features, and are more difficult to implement without breaking the API. Also, if I get it right, it's not a very urgent issue, thus maybe the better way of doing this would be an external plugin / experiment based on community effort. Also, with that, this advanced audio functionality would be completely opt-in.

Details of the audio plugin:

Many game engines have the concept of audio channels and mixers. Each audio source can be assigned to a mixer channel, and effects can be applied per-channel. Volume and pan can be directly controlled per-source, is indirectly controlled by the source's world position and also affected by the settings of the channel the source is assigned to.
Currently in Duality, every sound source is assigned to one native audio instance. The plugin would work differently, implementing it's own SoundInstance, SoundEmitter (SoundInstance wrapped in a Component) and SoundListener classes. Each of the SoundInstances would be assigned to a mixer channel, and each mixer channel would be assigned to a native audio instance.
The implication of the former is that the distance-based volume calculation and intensity falloff needed to be implemented plugin-side, because the OpenTk instances would be used for a different purpose. This is quite a task, also it's a question whether managed code is performant enough for this sort of number-crunching.

Any input on this?

ilexp commented 6 years ago

Idea 3. (cone angles) and 4. (effects) can be implemented in an additive manner (not breaking compatibility) exposing the corresponding OpenTk functionality. As described above. Idea 2. (volume gizmo) is more of a cosmetic/editor feature, and thus quite a separate business. It would make a nice standalone Good-First-Issue entry as well.

Agreed on both points. However, implementing the whole effects range seems like something to be used primarily in scenarios where you'd also require or use a more complex channel / mixer setup - so I'd defer this part for now.

Details of the audio plugin:

Sounds very neat, and yes, that would likely be a software implementation that really works on the actual audio samples manually - or at least I'm unaware of any matching OpenAL functionality. As such, this would be quite a bit of work, and should be treated separately from the other points.

One thought I'd like to add here as well: Right now, we have good control over rendering sources and targets - it's just a few lines to render a camera onto a texture, or scene A into texture A and scene B into texture B. When approaching channels and mixers, it would be nice to have the same functionality for audio as well, the ability to "render audio into a texture" and to control where audio output ends up, and what to do with it. The concept of an output channel could be the key ingredient to this.

mfep commented 6 years ago

However, implementing the whole effects range seems like something to be used primarily in scenarios where you'd also require or use a more complex channel / mixer setup - so I'd defer this part for now.

So my point was not to actually implement the effects, but to expose the OpenTk effects. These use a similar pattern to the currently exposed Filter, thus from this far it looks like a typing exercise (i.e. simple, not requiring structural changes). But, obviously, for that, we need to touch the Core and the AudioBackend. Like the currently present lowpass filter, these effects could be assigned per-SoundInstance.

This is a pre-requisite for the future implementation of the audio plugin, which would use the Core's SoundInstances as the backend for its channel constructs.

it would be nice to have the same functionality for audio as well, the ability to "render audio into a texture" and to control where audio output ends up

It reads like some sort of routing option for the channels. That means, that a channel's output can be another channel's input or the master channel (practically the application's audio output). We can go far with this... :slightly_smiling_face:

ilexp commented 6 years ago

So my point was not to actually implement the effects, but to expose the OpenTk effects. These use a similar pattern to the currently exposed Filter, thus from this far it looks like a typing exercise (i.e. simple, not requiring structural changes). But, obviously, for that, we need to touch the Core and the AudioBackend. Like the currently present lowpass filter, these effects could be assigned per-SoundInstance.

Alright, makes sense.

If we really support a bigger range of effects, we need to be careful figuring out the API we expose for that, as I'd like to follow an "open set" approach here, where we might, at some point, allow users to define and implement their own effects in software and use them just the same as the builtin ones.

It's a bit different from the settings struct approach, but I'm not sure we should really stuff every possible effect layer into one big item anyway - currently I'm thinking something along the lines of a list of effect instances that are applied in sequence. For OpenAL native effects, they'd just be carrying parameters along to pass over.

This is a pre-requisite for the future implementation of the audio plugin, which would use the Core's SoundInstances as the backend for its channel constructs.

Not necessarily - I think it might make sense to actually depend the experimental audio plugin directly on the audio backend, with no detour through nested SoundInstances, especially since that plugin would define its own instances and emitters anyway. Working directly on the audio backend also allows, for example, directly streaming software-generated samples to OpenAL, which would be in line with what I think would be required for custom channel / mixer audio processing.

It reads like some sort of routing option for the channels. That means, that a channel's output can be another channel's input or the master channel (practically the application's audio output). We can go far with this... 🙂

Yes! Exactly. It would also allow to contain a scene's audio output within a scene's own global channel, like it can already be done with rendering a scene's graphics output to a texture - almost a natural extension of the self-contained scenes idea from #504.

We'd need to figure out what exactly a channel and a mixer is, and what other classes an structures take part in this, and how it's all interconnected. Ideally, we could lay out the basic design first and gradually add functionality.

On the downside, this pretty much means implementing our own audio system and using OpenAL only to output the final mix as a platform independent layer. If we see channels as fully contained audio streams, we'd even need to do the whole 3D audio deal manually.

Thinking of it that way, we probably should find some middle ground. What would be the smallest possible feature set of channels and mixers? How would that look like? Would it make sense to see channels not as audio data streams, but as hierarchical audio source parameter groupings of some kind?

Schroedingers-Cat commented 6 years ago

Adjust the volume via Gizmo Set the min. / max. distance of the audio emitter via scene editor like scaling an object

Audio Areas Define audio areas (like shape of rigidbody ?) so the audio emitter gets the same volume everywhere in this area.

Points 2 and 5 are usually combined into one solution by effectively making the area of the min distance the area where the source's volume isn't affected by the listener position. Additionally, blending between the 3D positioning and simple 2D playback via distance or a custom curve allows for an area where the listener's position doesn't affect the positioning of the audio source. However, being able to define custom shapes like polygon colliders to define min and max distances of an audio source will allow for a lot of uncommon and creative use of these features.

Cone Angles the cone angles (inner and outer) specify how "sharp" the direction is. if inner and outer are 180, the sound is the same from all directions. if inner is 45, and outer is like 60, you'd have the full audio when the direction points at the listener within 45°, no (or minimum) audio at above 60°, and fading between both inbetween those angles

This sounds like emulating directivity patterns of a sound source. In the "real" world, a sound source's directivity pattern not only describes its direction-dependent volume but also its direction-dependent frequency spectrum. Furthermore, most sound sources don't have a "static" directivy pattern. E.g., a human making a vocal like an "a" has a different directivity pattern than a human making a sharp "s".

Usually, audio middlewares restrict themselves to a static directivity pattern, where the sound designer can define a direction-dependent volume attenuation and a direction-dependent filter, with the possibility to alter any parameter of any available effect based on the listener's position around a sound source.

If #482 was implemented somehow, these per-source effects wouldn't be available anymore.

In which way are these features mutually exclusive?

Many game engines have the concept of audio channels and mixers.

Having an audio mixer where each channel can be routed into another channel or different mixer (without looped dependencies, of course) would be extremely helpful for sound designers. Some more features for the mixer could be:

Snapshots and transitioning between snapshots
Instancing audio effects on different mixer channels
API allowing to modify the mixer programatically
Changing a sound source's mixer input during runtime

However, as it has already been pointed out, 3D positioning needs to happen before the mixer. That leaves you with two choices:

Switch to OpenAL Soft and add the necessary mixing functionality there (not good for portability)
Use OpenAL as a simple audio API abstraction layer and implement the 3D positioning in Duality

EDIT: OpenTK already seems to be using OpenAL Soft on Windows: https://github.com/andykorth/opentk/issues/18#issuecomment-29624933 I forgot that with the second option you'd have to write the mixing engine in C#, too.

it's a question whether managed code is performant enough for this sort of number-crunching.

I think implementing the 3D positioning and distance-based parameter modulation effects can be done efficiently in C#.
EDIT: However, doing all the audio mixer work in C#, where users can stack multiple audio effects, could turn out to be not the best choice, performance-wise.

Some more feature ideas:

Implementing PureData would allow procedural audio assets in Duality
Expand the sound asset with more common behaviour like pitch&volume randomization, attack-loop-release-behaviour and retrigger options

ilexp commented 6 years ago

EDIT: OpenTK already seems to be using OpenAL Soft on Windows: andykorth/opentk#18 (comment) I forgot that with the second option you'd have to write the mixing engine in C#, too.

We're using a fork of OpenTK and handle this part manually in the platform backend of Duality: The default is the systems native OpenAL, but there's a special case for Windows to check whether a system implementation is available and fall back to OpenAL Soft, if the answer is no. It's there, but mainly as a fallback.

This sounds like emulating directivity patterns of a sound source. In the "real" world, a sound source's directivity pattern not only describes its direction-dependent volume but also its direction-dependent frequency spectrum. Furthermore, most sound sources don't have a "static" directivy pattern. E.g., a human making a vocal like an "a" has a different directivity pattern than a human making a sharp "s".

Usually, audio middlewares restrict themselves to a static directivity pattern, where the sound designer can define a direction-dependent volume attenuation and a direction-dependent filter, with the possibility to alter any parameter of any available effect based on the listener's position around a sound source.

Ah, makes sense - if we're doing both effects and direction dependence, we could combine the two to specify not only volume, but also effect parameters in a direction dependent way.

Random thought: From an audio designer's perspective, would there be any value in skipping the entire effects parameterization and instead having two audio sources that are crossfaded depending on the cone angle? It would allow to apply arbitrarily complex filtering and effects externally (in the audio software of your choice) and maximum artistic freedom, although still limited to an "inside the main direction" / "outside the main direction" distinction.

Schroedingers-Cat commented 6 years ago

We're using a fork of OpenTK and handle this part manually in the platform backend of Duality: The default is the systems native OpenAL, but there's a special case for Windows to check whether a system implementation is available and fall back to OpenAL Soft, if the answer is no. It's there, but mainly as a fallback.

I see. I had another thought about the performance of the audio mixer plus user stackable audio effects. That part could be implemented in C++ without any OS-dependency, just plain objects and buffer calculations. So C# would hand the raw audio buffers into the C++ part where the 3D positioning + source effects + audio mixer + mixer effects are happening. Once the "master" channel buffer is ready, it gets fed into OpenAL. Would still require rewriting a lot of existing OpenAL functionality, but it would be portable and very fast.

I also checked the OpenAL specification to see if it supports feeding the output of a source into another source (which could be used to simulate a mixing engine) but found nothing in that regard. Maybe you can find something like that in there?

It has come to my attention that Unity is using an AOT compiler (they call it "Burst" compiler). Is there something similar available that we could use? It may help reducing the latency when doing the audio mixer + audio effects in C# and thus improving performance considerably.

Random thought: From an audio designer's perspective, would there be any value in [...] having two audio sources that are crossfaded depending on the cone angle?

Yes, but I makes more sense to not restrict that to the cone angle. Layering is a general and broad concept and should not be restricted to a specific parameter. Here is an example of how it works in audio middlewares: untitled_ - event editor 24 09 2018 19_55_54 This can be done without a fancy GUI. Have something like a layer list and define a curve and a parameter for each layer, that's all the data you need.

A crucial part with this technique is timing. Starting multiple sources at once needs to guarantee that they start exactly the same time/buffer/sample position. If not, phase effects will happen, most likely not to the enjoyment of the audio designer.

mfep commented 6 years ago

If #482 was implemented somehow, these per-source effects wouldn't be available anymore.

In which way are these features mutually exclusive?

The OpenAL effects can be applied per-audio-source. If we use the native audio sources for the bus channels, it's only possible to set the effect on a bus, not on a single audio source. (Of course as a workaround every source can be assigned to exactly one channel with the desired effects.)

Audio Areas Define audio areas (like shape of rigidbody ?) so the audio emitter gets the same volume everywhere in this area.

A quite intuitive way of connecting this to the mixer system would be, that an audio area triggered a mixer channel change on the entering and exiting audio sources. However, without using the physics system, the triggering is again far from trivial.

So as I can see, the conversation went quite far from exposing some parameters to implementing an audio workstation to Duality. I think at this point we should also consider switching to a different audio backend, which perhaps provides more functionality that we need. Unfortunately there's not a huge load of open-source options, among them SoLoud seems to be the best. It's a middleware itself, supporing various backends (including OpenAL). Also it has

C# bindings
channels/mixers
3d audio
a couple of synthesizers
a couple of filters and effects
audio file streaming
grouped sound instances (guaranteed to start together)

In my opinion, this project could be a valuable resource, used directly as well as a source code reference.

Schroedingers-Cat commented 6 years ago

The OpenAL effects can be applied per-audio-source. If we use the native audio sources for the bus channels, it's only possible to set the effect on a bus, not on a single audio source. (Of course as a workaround every source can be assigned to exactly one channel with the desired effects.)

I'm unaware of #482 having any involvement with bus mixing structures. The best implementation to #482 won't change a thing in how the current systems interact with each other. It boils down to some averaging between virtual audio listeners affecting the audio source's parameters in Duality.

Audio Areas
Define audio areas (like shape of rigidbody ?) so the audio emitter gets the same volume everywhere in this area.

A quite intuitive way of connecting this to the mixer system would be, that an audio area triggered a mixer channel change on the entering and exiting audio sources. However, without using the physics system, the triggering is again far from trivial.

Why should the audio emitter's volume have any effect/change on the way it is mixed in the mixer system? Wouldn't it be simpler to treat this level-information at the sound source's 3D positioning/attenuationing stage than in the mixer stage?
Maybe I understood point 5 wrong. What functionality should it provide? To me, it sounds like creating an area with a constant volume.

Yes, SoLouds features are impressive. However, their OpenAL backend is probably not what one would expect. From the description:

Very experimental. Very high latency; if this is your only option, you're probably better off using OpenAL directly. No x64.

mfep commented 6 years ago

I'm unaware of #482 having any involvement with bus mixing structures. The best implementation to #482 won't change a thing in how the current systems interact with each other. It boils down to some averaging between virtual audio listeners affecting the audio source's parameters in Duality.

Sure it's possible with that setup. What I meant originally, that if we redirected all listeners to a single native audio source, then we wouldn't get the native effects on each of the emitters.

Maybe I understood point 5 wrong. What functionality should it provide? To me, it sounds like creating an area with a constant volume.

Yes, you are right about that. The idea was that the zones could have their own effect section defined as well - but let's drop this for now.

ilexp commented 6 years ago

That part could be implemented in C++ without any OS-dependency, just plain objects and buffer calculations. So C# would hand the raw audio buffers into the C++ part where the 3D positioning + source effects + audio mixer + mixer effects are happening.

C++ portability is kind of a different beast than C# portability. Duality is 100% managed C# right now, and introducing C++ would make portability a lot more complicated - we should avoid this.

So as I can see, the conversation went quite far from exposing some parameters to implementing an audio workstation to Duality.

[.. other discussion ..]

Okay, I think we need to keep an eye on the scope of these changes blowing up.

Let's scale down this goal a bit - an entire audio workstation would probably be overkill, but we can definitely improve both low-level and high-level API and functionality that Duality provides. So far, we have identified some general topics to look into regarding bigger / mid- or long term changes in the audio system, most of which exist independently of each other:

Improved control over data / signal flow, e.g. mixers and channels
Multiple listeners, and listener contribution scaling, see issue #482
Extended effects and filtering
Extended sound source / instance parameters, such as directionality or source shapes / zones
Improved control over played audio data to allow (user) implementations of layering, seamless music, generated audio, etc.

Some of those items are quite big, so we should try to identify common prerequisites and self-contained features, to find a way to gradually progress in smaller increments.

Schroedingers-Cat commented 6 years ago

What I meant originally, that if we redirected all listeners to a single native audio source, then we wouldn't get the native effects on each of the emitters.

Absolutely! E.g. if the mixing is done in Duality/C# and OpenAL just gets the master sum of that, the OpenAL audio plugins need to be ported to C# or we code the plugins ourselves.

The idea was that the zones could have their own effect section defined as well - but let's drop this for now.

Actually, that's a cool idea. I would avoid live instancing these "area" effects. Instead, the sound designer could setup specific effect parameter settings per zone and upon overlapping or entering/exiting these zones, transitioning between these effect parameter settings begins.

ChristianGreiner commented 6 years ago

Audio Areas Define audio areas (like shape of rigidbody ?) so the audio emitter gets the same volume everywhere in this area.

Like @Schroedingers-Cat said, the base idea was that you can define different zones in which the sound or music has the same volume. (like singing birds or wind noise etc..).

Actually, that's a cool idea. I would avoid live instancing these "area" effects. Instead, the sound designer could setup specific effect parameter settings per zone and upon overlapping or entering/exiting these zones, transitioning between these effect parameter settings begins.

This would be really cool! +1 for that

Schroedingers-Cat commented 6 years ago

So far, we have identified some general topics to look into regarding bigger / mid- or long term changes in the audio system, most of which exist independently of each other:
* Improved control over data / signal flow, e.g. mixers and channels
* Multiple listeners, and listener contribution scaling, see issue #482
* Extended effects and filtering
* Extended sound source / instance parameters, such as directionality or source shapes / zones
* Improved control over played audio data to allow (user) implementations of layering, seamless music, generated audio, etc.
Some of those items are quite big, so we should try to identify common prerequisites and self-contained features, to find a way to gradually progress in smaller increments.

The way I see it, the following features can be implemented without rewriting existing OpenAL features:

Multiple listeners
Audio effects per source instance (existing functionality in OpenAL)
"Audio Areas", source shapes, realtime controlled parameters
Layering features

However, adding more control over the signal flow by implementing an audio mixer means a lot of existing OpenAL functionality needs to be rewritten. This also affects OpenAL audio effects. Also, performance of OpenAL vs the re-implementation in C# will probably be an issue.

Some more thoughts:

Is implementing PureData via libpd possible, concerning the "100% managed C#" status of Duality?
There is this library cscore, a pure .NET audio library. Some work has been done on an OpenAL backend, but I don't know what the current status is. https://github.com/filoe/cscore/pull/93
@ilexp what exactly do you mean by "Improved control over played audio data to allow (user) implementations of layering, seamless music, generated audio, etc."?

ilexp commented 5 years ago

@ilexp what exactly do you mean by "Improved control over played audio data to allow (user) implementations of layering, seamless music, generated audio, etc."?

It boils down to improving the high level audio API, so users can implement advanced features themselves without help from the core side. Use cases that came to mind are:

A SoundInstance with a user-controlled live feed of samples for generated or highly dynamic audio.
A streamed SoundInstance with a user-controlled queue to allow, for example, do a seamless (no cracking, pauses or fading) loop that chooses different audio data for each loop iteration - like dynamic background music that takes different turns depending on game data.
Layering features, maybe using the "custom sample feed" above.

Right now, the high level audio API that Duality provides is somewhat limited to the most common use cases: You play some sound in 3D or 2D, it can move around, change pitch, get a lowpass, it can loop and fade in and out, stream OGG audio data, and other base functionality. However, if you want to access audio data directly, there's no easy way to do it, as it all happens behind the scenes. The closest you can get is by abandoning the high level API and using the low level one directly, but that also means you lose all of the above mentioned functionality unless you rewrite it from scratch.

So what I'd like to do is improve and extend the high level API to allow users to interact directly with played audio data more easily.

However, adding more control over the signal flow by implementing an audio mixer means a lot of existing OpenAL functionality needs to be rewritten. This also affects OpenAL audio effects. Also, performance of OpenAL vs the re-implementation in C# will probably be an issue.

Yep, let's skip that for now. I still like the idea, but it's probably one of the biggest chunks we listed so far and I think it makes sense to take some time to consider our options. Also wouldn't rule out the idea of seeing channels as hierarchical parameter groupings just yet, since that would remove the need for rebuilding what OpenAL already does entirely - which would be a big plus.

Is implementing PureData via libpd possible, concerning the "100% managed C#" status of Duality?

There is this library cscore, a pure .NET audio library. Some work has been done on an OpenAL backend, but I don't know what the current status is. filoe/cscore#93

Unfortunately there's not a huge load of open-source options, among them SoLoud seems to be the best. It's a middleware itself, supporing various backends (including OpenAL).

Generally speaking, introducing any new dependency is something we should only do when absolutely necessary, especially when they contain any native code. Given the required work, maintenance and portability impact vs. what would be to gain, I'm not convinced we should add any of those libraries so far. Adding libraries is something any user of Duality can do, so I'd instead put the focus on making sure that these users can do something useful with those libraries and Duality. That's where an improved audio API becomes a key point.

mfep commented 5 years ago

It seems that we have a general direction and a list of user needs which seem to be feasible to implement. The next step would be to agree on some general architecture. The following diagram is a proposal on this, up to debate: dualityaudio The diagram is editable through draw.io: link This terminology is not following the current class names in Duality.

Audio Source is a basic audio output node. The output stream be generated real-time or streamed from an audio file. User implementations would be possible and welcome.
The Custom Effect Stacks give the user the opportunity to implement their own DSP code in the form of input-output nodes. How the user could interact with the routing of of the effect stack is yet to be decided.
Audio Emitter is a similar concept to the current SoundInstance implementation. It would serve as an interface to the native audio settings, such as 2D/3D sound, position, volume, pan, etc. The signal input of the Audio Emitter is one or more Audio Sources. This would require some DSP code on our side as well, since these signals need to be summed. Every Audio Emitter is connected to exactly one Native Audio Instance. The Native Audio Effects are controlled through this construct as well.
Audio Group is a concept in need of some more evaluation. It'd make possible to modify some parameters of the Audio Emitters in a grouped manner, without any DSP processing. A middleman between the Native Audio Instances and the Audio Emitters.
Multiple Audio Listeners are allowed. An algorithm detailed by @Schroedingers-Cat in issue #482 would update the parameters of the Native Audio Listener accordingly.

ilexp commented 5 years ago

Great overview, the diagram really helps too 👍 Let's refine this design a bit.

Multiple Audio Listeners are allowed. An algorithm detailed by @Schroedingers-Cat in issue #482 would update the parameters of the Native Audio Listener accordingly.

As far as I understand it, the multiple listener example would need some core interaction as well - since the audio backend operates in a single-listener context, and the averaging algorithm needs to access individual playing instances, it would need to work on (core) Audio Emitters, not the Native Audio Listener. The backend can remain "dumb" / simple and thus, easy to port, while the core does the more complex stuff.

Audio Source is a basic audio output node. The output stream be generated real-time or streamed from an audio file. User implementations would be possible and welcome.

The Custom Effect Stacks give the user the opportunity to implement their own DSP code in the form of input-output nodes. How the user could interact with the routing of of the effect stack is yet to be decided.

Audio Emitter is a similar concept to the current SoundInstance implementation. It would serve as an interface to the native audio settings, such as 2D/3D sound, position, volume, pan, etc. The signal input of the Audio Emitter is one or more Audio Sources. This would require some DSP code on our side as well, since these signals need to be summed. Every Audio Emitter is connected to exactly one Native Audio Instance. The Native Audio Effects are controlled through this construct as well.

Audio Group is a concept in need of some more evaluation. It'd make possible to modify some parameters of the Audio Emitters in a grouped manner, without any DSP processing. A middleman between the Native Audio Instances and the Audio Emitters.

There are two points I don't yet see accounted for here:

Native Buffer Handling is currently missing from the design entirely. In OpenAL, and I suspect in other potential backends as well, we'll always need to upload sample data somewhere before we're able to use it, similar to how graphics backends require you to always upload vertex data into a buffer before using it. We need to pin down, where and how interaction with these buffers happens.
The Simple Use Case of playing a sound that's neither streamed nor generated, but just loaded into a buffer to be used as-is when needed. These are the majority of all played sounds, so the design should make sure this case is efficient / fast and easy to use.

One way to address this would be to mimic the rendering upgrade that v3 got and expose native audio buffer handling to the core and plugins using a new AudioBuffer object that wraps a native one in a nice API and does the low-level management stuff - similar to how VertexBuffer does that for vertex data. The Audio Emitters from your diagram could then either use one static buffer to play regular audio, multiple buffers for streaming audio, and user-controlled buffers for generated or otherwise fully controlled audio.

Also, some naming change requests old --> new:

Native Audio Instance --> Native Audio Source
Audio Emitter --> Audio Source
Audio Source --> Audio Generator? Audio Stream? ..? Not sure, but they might be redundant anyway, when designing Audio Emitters / Sources as described above, since any user code that manually controls them would be in that position already.

Adding some links on how the audio stuff works in Duality right now, for reference to anyone who might join in:

Audio Backend Interface: The API between audio backend and core, and the optional low-level way to control audio.
Audio Classes in the Duality core - the high-level way to control audio right now.
SoundEmitter, a convenience component that plays audio using the high-level way.
SoundListener, a component that binds the single audio listener to a Transform component, the high-level way.
OpenAL Audio Backend, the current implementation behind the low-level API.

This is the status quo that we're working on to improve.

I'll have to cut it short for time reasons right now, hopefully more next time 🙂