freezy / VisualPinball.Engine

:video_game: Visual Pinball Engine for Unity
https://docs.visualpinball.org
GNU General Public License v3.0
392 stars 62 forks source link

Audio and Sound Processing #226

Closed Vroonsh closed 1 year ago

Vroonsh commented 3 years ago

In terms of audio, we can roughly distinguish between two different types of sounds:

In the following I'll dive into each of those categories and explain how VPE should handle it. This is of course subject to discussion, and I'll update it should there be better approaches.

Game Sounds

For ROM-based games, this is pretty trivial and already implemented: We get a sample buffer from the ROM and write it back to Unity's buffer in OnAudioFilterRead(). We even patched PinMAME so it sends floats so we don't have to convert it anymore.

For original games, we should probably distinguish between a few differnet types of sound such as music, sound effects and shout-outs. This is low-prio at the moment, and will be spcificied more in detail in another issue.

Game sounds are in stereo, although the vast majority of ROMs produces mono sounds, which then are sent to both channels.

Mechanical Sounds

We split mechanical sounds into two categories: Ball rolling sounds and all other sounds. What makes ball rolling sounds special is that they are not predefined but depend on the ball's velocity, position, and produce data as long as the ball rolls. We call the other sounds static sounds, since they are stationary and only depend on distinct data such as a hit event, rather than a time-dependent data sequence.

The source of mechanical sounds is in mono, and they get spatialized through Unity's audio mixer, based on the emitting object's position in space.

Static Mechanical Sounds

These sounds are recordings from a real pinball machine and are played when the respective event occurs.

However, since playing the same recordings over and over won't sound very realistically, we should allow authors to provide multiple recordings per sound. Additionally, an author should be able to randomly change the pitch and speed of the sound.

We call recording, sample or clip the single recording, and sound the lot with multiple recordings.

There is another factor, which is that not all balls are made out of steel. Some games like TAF have a ball made out of ceramic that is lighter. Thus, authors should be able to assign a sound (made of potentially multiple recordings) per ball material.

Triggering

There are two ways of triggering a sound: By a physics event such as a collision, or by GLE events such as a coil event. Authors should be able to configure sounds to be only triggered to start, or both to start and end. For the latter, they should be configured to either loop, or play only once.

Configuration

Since mechanical sounds are positioned in space, they should each configured on the game object that emits them. This should be done with a component that pulls the relevant physics events from the game item component.

Additionally, it might be worth thinking about a global sound manager so sounds can be quickly set up without having to navigate too much through the hierarchy. As @Pandelii noted, there are probably not so many different sounds, so authors should be able to easily reuse them.

Ball Rolling Sounds

This is still R&D and will be developed as we go.

Pandelii commented 3 years ago

Would it be possible to provide 7.1 surround for cab setups that use the 7.1 to drive exciters? http://vpuniverse.com/forums/topic/3232-new-vpx-feature-surround-sound-output/ It seems unity supports directing the output to a specific speaker https://thomasmountainborn.com/2016/06/10/sending-audio-directly-to-a-speaker-in-unity/

freezy commented 3 years ago

It really depends where we're going with handling new mechanisms and how we integrate them into the editor and finally runtime. For me, for everything that concerns scripting, there are no "new VPE tables", they are all new. So we should choose the way most convenient to the author (i.e. linking sounds directly in the editor to game mechanics or other elements), and during runtime they should be handled in the most efficient way. I also haven't dived into the SSF stuff, which should be part of our planning.

There is also DOTS audio, but I haven't looked into what it is exactly either, it might be mainly centered around audio processing.

simonbethke commented 2 years ago

Lets write a bit about audio for visual pinball engine. I think the audio system can easily get split into two parts: One is the part the sounds that are synthesized or sampled on a real pinball machine. These might be triggered by scripts or events and might also origin in VPinMame. The other is mechanical noises that usually occur as side-effect of the mechanical nature of pinballs. While the first one appears to be rather straight forward but should support to select a dedicated audio-output-device, the second is the kind of audio I want to focus on.

At this point I want to split also this second kind of audio into two natures: One is the kind of complex audio that basically always sounds the same, for example the ball rolling into the trough. The other is the audio that occurs when collisions happen in the physics engine. Again, I like to focus more on the second kind as the first kind also seems to be very straight forward.

Thinking about audio caused by phsical collisions, I found that all audio can easily be modelled by an Impulse Response (IR). Impulse responses are widely known for modelling reverb which uses Infinite Impulse Responses (IIR). The IIR models something I'd call a white-click-response. I chose this name because it is what you would hear if you have a white noise (a white noise is like static noise that covers all frequencies evenly) in a room and shut it off, the room would sound like that from the moment of shutting the source off. So for reverb (and also speaker simulations) these impulse responses are used infinetly which means that you play the IR (the white-click-response) for every single input sample multiplied by that sample including its phase. As the IR already contains all the spectral and temporal properties of the mass-feather system including its capabilities to resonate on certain frequencies, you get the complete audible character of the underlying materials captured in an IR that lasts only a few hundred samples (or millions of samples in case you capture the IR of an cathedral). Now the reason, why reverb is simulated with infinite IRs - IRs that are applied on every input sample - is that every sample collides with the air which causes this reverb.

In case of a pinball we don't care about reverb (yet), instead we need to cover the dry (reverb-less) sounds of materials hitting each other. That means that every combination of two materials requires a distinct IR. At pinballs 99% of the time, the ball is always the same material and thus we need IRs for the ball impacting each material once. Unfortunately this still only gives a rough picture because the shape and the impact position on a certain shape changes the IR. An increadible complicated but almost correct implementation of that would be to record impact IRs at multiple positions of the same object and store them with coordinates. When playing back, the closest IRs can be mixed relative to the impact position.

A special case will be rolling noises. The noise that occurs for a rolling ball is caused by many micro-impacts on the playfield or a ramp. No physics engine would ever calculate these impacts as they are far too many. So the phsics engine should create rolling events. I could imagine, that in every physicsframe the rolling path since the last frame is serialized as an audio-event that contains path between two points. These points are the touching points of the ball in the last and the current frame. The rolling event furthermore will require a the ball id (in case balls might have a different material) and the component id to determine the material the ball is colliding with.

In general, I suggest to organize IRs in a tree structure. The root should be the ball material, then the material of the collision object, then the object itself and last a certain position on this object (if multiple IRs shall be blended). When a collision happens, the tree can be used to find the most specific IR. So a physics frame should spawn audio events that are handled by an audio event player.

Pandelii commented 2 years ago

What you are describing has been covered pretty well in various research papers. This one may be of use.. http://gamma.cs.unc.edu/AUDIO_MATERIAL/ extensions to this method can be obtained by baking additional data from analysis into a unique texture whose lookup coordinates can be derived from world position. By storing information about the geometry in a lookup table,, the response modeling can be easily augmented based on the impact location. This approach also allows a scalable cost through mip access. It also allows for a material ID to be stored for objects that have multiple materials in the same texture. I.e a single material that presents both metal and wood. The I'd has to be supplied by the creator however the rest can be generated by analysis.

simonbethke commented 2 years ago

Coming a bit closer to the actual implementation of audio in unity, there are AudioSources that take all audio from audio-clip-files. Only these pre-existing files can be played back in at a spatial position with unitys spatialization engine. However, I figured that the procedural audio of a rolling ball should not travel with the ball anyways because if it would do that, the doppler effect would kick in and the result would sound simply wrong. In reality, the rolling noise is not emitted by the ball but by the playfield which is stationary.

Rolling Ball The playfield should become an AudioSource and get an additional component that creates rolling ball audio in a procedural way. For that it requires BallData information that are available in BallMovementSystem and should be passed to the component every time BallMovementSystem calculates it.

Collisions Any time a collision happens, a gameobject with the collision clip should be spawned at the position of the collision. This temporal object should remove itself after the playback has ended. This gameobject needs to utilize spatial audio and the clip should match the materials colliding.

Flippers I have the impression that the audio of flippers can easily get improved compared to vpx. If possible the sound should get decomposed and remixed as required. If a flipper rotates to its end position I assume that these noises are played:

  1. "flipper-coil-on" The coil bridges the gap of play and causes some kind of initial sound when the flipper starts moving
  2. "flipper-moving" During movement the axis might cause sound that most likely is very silent
  3. "flipper-end-stop" At the end position the stopp causes also a sound

When releasing the flipper the following sounds would play:

  1. "flipper-coil-off" The coil is not holding the Flipper anymore and if there is play, this play is released first
  2. "flipper-moving" During movement the axis might cause sound that most likely is very silent
  3. "flipper-start-stop" At the start position the stopp causes also a sound

With these sounds one could easily imagine that pressing the flipper button only a very short time would result in:

  1. "flipper-coil-on"
  2. "flipper-moving"
  3. "flipper-coil-off"
  4. "flipper-moving"
  5. "flipper-start-stop"

I think the moving sounds should stop every time another sound is played. Any other sound should continue playing to its end. The volume of the stop sounds (which are the noisiest ones) relates to the change of the rotation speed of the flipper.

freezy commented 2 years ago

Sounds good! However we already have GameObjects for all elements on the playfield, so I don't think you'll need to spawn anything.

simonbethke commented 2 years ago

Yea, I am not sure about this. Generally you are right, but there is one exception which makes me think over this and come to an unexpected result: If two balls collide with each other, both balls would play the collision at equal volume and even the velocity of the balls would make the spatial engine pitch the sound. To implement this case not as exception but as feature, every collision should play at both colliding objects. I assume that there might be parameters like material density which define how the impact energy is distributed. A ball impacting in a wooden object would cause the wood to play like 90% of the volume and the ball only 10%.

syllebra commented 2 years ago

Sound comes from every vibrating object, and more specifically, from every surface point of the vibrating object. You can then imagine that it is, in practice, impossible to simulate accurately, in particular in Realtime applications like VPE or any pinball simulation. You can of course set an audiosource to every object (because for example, rolling can occur not only on playfield, but also on ramps and every object...) but this would require some iniial computation to "transport" the sound to the emmiter position. Keep in mind that this is still approximation.

This is why we usually sets the ball as the audio source: collision/rolling sound is almost always generated by ball or solenoids and it is convenient and efficient way to simulate spatial sound. In UP/UP2, there is an pool of impact sounds and rolling sound attached to each ball and one audio source to each solenoid object.

But after a though, I would externalize the "impact sound" pool to a global one as, once an impact sound is produced, the spatial position where it originates from must not move with the ball (which is the case for rolling ones). This would answer the two colling balls problem: one sound for one impact, as an impact is always from 2 objects and it stays in the collision location until fully played, then pulled back in the pool when available....

syllebra commented 2 years ago

Oh and in UP2, there is also audio sources from speakers positions which plays PinMame engine simulated sound....

simonbethke commented 2 years ago

Ok, I don't really know what UP is in this context. I thought VPE had no sounds yet. Simply playing sounds back in my oppinion means simply keeping the soundscape on the level of vpx. I am not asking anyone to do all this, I like to implement it myself. Let's see if I am capable of doing that and what kind of features that can be implemented we can come up with.

Pandelii commented 2 years ago

Typically in games we simplify by positioning the sound emitter at the location of the impact. This is a useful approximation because the information is provided with the impact event and no further calculations need to be done. The ball should probably be the emitter, not the playfield, as the ball is not guaranteed to be rolling on the playfield. This is how we handle footsteps in games with characters. The ground is not the sound source, the impact location is. A simple ray check with an origin at the ball location and a down vector is sufficient to find the contact point and object, from which the physical material ID can ge obtained.

simonbethke commented 2 years ago

Playing the sound on both colliding objects make many things more simple:

Rolling sounds are covered on an entirely different way. There is no step-sound that can be played once a while. Spatial movement of procedural audio seems to be not possible at this point. One hurdle to take is to detect that two subsequent collision belong to the same rolling sound.

syllebra commented 2 years ago

UP (Unit3D Pinball) is a pinball simulation I made in Unity a while ago, and I used to develop the version 2. Unfortunately, I was blunt by development and did not saw VPE project until late so I made a lot of things... doubled. But (@freezy may agree with that, no offense), we went farther than VPE in the Player aspect at least (UP was able to play 15 tables fully in Unity, sound, physics, PinMame, etc...). So I already encountered all those problems a while ago (even on UP1, started in Unity 5!). I just give my humble opinion on how I resolved/should have resolved it.

(BTW, I still have to open source UP and I think there will be a lot of things that actually can help with VPE. I already shared FP importer, but sadly, I really don't have much time to help you guys with VPE right now, just some little things here and there.)

I am glad though to see new approaches for sounds in particular, but for everything else. Good luck

Pandelii commented 2 years ago

Rolling sounds are indeed different from stepping sounds. Instead of a one off play, it is always playing when the velocity of the ball greater than zero. It is still modulated in discreet steps at a fixed frequency depending on which update function you utilize.

Basic premise is simple: AudioUpdate While BallVelocity > movement threshold
Do Material Check Raycast -Up If materialID != previousMaterialID ChangeSoundParameters

Futher extensions can be made in the case of multiple constant contact locations, i.e rolling while hugging a wall. Though those are only necessary when the material differs from the base and can be determined by detecting if a collision has been occurring for longer than a set threshold.

If spatialization of pure code audio is not possible, it may be possible to supply a base audio noise as a static looping clip, then modulate the clip using a filter stack that is procedurally adjusted according to sampled parameters.

I think that volume modulation based on geometry alone is likely to be insuffient. It may be beneficial to experiment with modulation of other parameters based on the impact velocity and physical properties. A larger object is not just louder, it has a different general tone.

For simplicity sake, there need only be one ball type. The physical materials of the environment can be generalized as base materials, if which there are only a handful that need description:

Wood Wood hollow Metal thin Metal thick Metal hollow Plastic Plastic thin Plastic hollow Rubber Glass

Because we only have a metal ball type, which covers over 99% of use cases, the matrix is simple. More variation in the Audio landscape will arise as a result of property modulation based on the geometry, impact angle, impact location and impact velocity.

While I've not implemented this system for pinball, I've implemented it for other games and the results were good while remaining efficient. Playing two audio sources is probably an unnecessary expense, though worth experimenting with just to see.

syllebra commented 2 years ago

@Pandelli Totally agree. These are almost the classes I used in UP/UP2

To be more specific, two sounds for each pair of the matrix: one for impact, one for rolling. The impact volume is easy: it is modulated by the relative impact velocity multiplied by the dot product of the normalized velocity directions. This prevent big impact sound for example when the ball gently arrive parallel to a wire guide.

Then, for each pair of collision, a queue is kept and check on next step to see if the contact is lasting. If it is,the rolling sound volume is updated (right now, only with relative velocity, which is almost always the velocity of the ball only as rolling sound almost never occurs between two moving objects like two balls). Rolling sound is discarded if the contact has stopped and then pushed back in a pool.

(Please keep in mind I a m not saying this is what needs to be done, it's what I have right now in UP2)

freezy commented 2 years ago

I've updated the main description.

simonbethke commented 2 years ago

I think unity doesn't support multiple AudioSources on one gameObject or multiple clips on one AudioSource. To overcome this limitation, I could imagine instead of an AudioSource a list could be implemented as component that generates temporary GOs with audiosource every time it is played.

Regarding the mode with starting, stopping and looping. I think authors can have another flag that defines if multiple play-calls may overlap or if the second call Stopps the first one.

simonbethke commented 2 years ago

There is one more thing regarding sounds that could be a thing (maybe not for an MVP but after that): Some kind of reversable sequence. I think there are multiple applications that need a startsound, a motor-sound in the middle and an end-sound. For example the bookshelf in The Adams Family that is slowely rotating. You might need a sound when it starts rotating. Then a sound for the motor (that can be looped) and a sound when reaching its endposition. If this is one clip with all samples already mixed, the bookshelf in audio always has to rotate to the end-position before the audio for the rotation back can be played.

There would be other ways to approach that and I just would like to make this a topic. Another way would be one sample for a start-to-end rotation forward and another one for the same backward. Then if the rotation reverses in the middle, the forward sound can be stopped and the backward sound can be played starting somewhere in the middle. This approach is less universal: Another example would be Dracular in Monster Bash. It moves on a path for a more or less random distance. So the motor sound can be played longer or shorter.

As @freezy already mentioned we are talking about "Sounds" and "Clips/Samples..." a sound is a more abstract concept that may play one of multiple clips. Maybe the sound could be some kind of interface that is somewhat an event listener.

All these are just ideas that one could keep in mind if starting with a RandomSound class :)

syllebra commented 2 years ago

Wrong! Unity CAN handle multiple AudioSources on One object (btw, if it where not, it would be easy to setup mock children for each audiosource). The limitation is regarding AudioListener: there can be only one in the scene I think, which makes sense to render spatial audio and it is most of time setup on the camera.

As a matter of fact, Addams was the very first table I made for UP. We did not went so far as starting/ending sound for bookshelf, just motor one and to be honest, this was already immersive and enough with pingpong loop.

simonbethke commented 2 years ago

Perfect. Nice to hear that I was wrong regarding this limitation. I must have understood something wrong...

The motor of the bookshelf was maybe a silly example. I really see the main use for the sound of the flippers. Which in my oppion audibly sound wrong in vpx if you revers the flipper rotation midway.

markran commented 1 year ago

I'd like to suggest implementing a layer of abstraction for mechanical sounds similar to how music instrument sampling has evolved in the MIDI / digital audio workstation domain. I'm fairly new to VPin but have experience in the audio sampling world, where a MIDI drum kit seems fairly analogous to mechanical table sounds: with Kick Drum, Snare, Tom Tom, etc similar to Solenoid, Flipper, Bumper, etc.

There are two ideas in drum kit sampling I see as potentially useful to this use case:

  1. Being able to swap out one "kit" of drum sounds for another easily. Hardware drum samplers offer several different "kits" and software-based samplers can load an infinite variety from disk. Essentially, these are just WAV file-like audio samples but in the case of drums the mapping evolved to be somewhat standardized to a base set of 16-ish notes. Although drum kit sounds can vary widely from soft jazz to hard rock to techno to ethnic flavors, most types of rhythmic percussion have something like a kick drum, a snare and a cymbal in the same way as most pinball tables have something like flippers, bumpers, etc. This approximate mapping allows musicians to easily audition the same MIDI drum track using any kit from the sounds of an 80s electronica 808 drum machine to a set of tribal drums made of animal skins. Since VPin users can have so many different configurations (DOF, SSF, etc) and channels, a basic mechanical sound kit with the most common table sounds mapped to a common abstraction would save a lot of time for users adapting tables to their configs while simplifying authoring. As an example, Mechanical Channel-1 could be for Flippers, Mechanical Channel-2 would be for Solenoids, etc. Authors would then assign a sample to each channel and those assignments can be saved, loaded, edited or shared as a set (or "kit" in drum sample lingo). This would be an abstraction like a POV file is for viewpoint. Authors would still be free to vary from the conventional assignments if the specific table needs something different but the idea is to establish a de facto set of basic assignments with variations possible when necessary.
    Existing VPX tables with samples assigned directly by filename could still work but as authors implement the channel assignment convention, a user could change the sound kit of a table from a set recorded on a 1970s EM machine to a set recorded from a modern table or even something wildly creative we can't imagine yet. With object-based and virtualized audio formats like Dolby Atmos, Apple Spatial Audio and MPEG-H evolving rapidly and the creativity of table authors exploding, the landscape of future output capabilities to accommodate will only grow more diverse.

  2. The other concept from instrumental audio sampling which might be of interest is "round-robin" sample playback. This technique evolved to allow sampled instruments to sound closer to musicians playing analog instruments. When a score repeatedly triggers the same sample in quick succession such as a staccato phrase or drum roll, it can sound artificial because listeners notice the lack of natural acoustic variation. Modern samplers give authors the freedom to supply more than one sample for some or all sounds. If the same sample is played in quick succession the sampler automatically rotates through the list to provide subtle variation. I think this might be useful in the case of certain mechanical components like pop bumpers because they are close to one another and can trigger repeatedly. However, I'm not sure this is an appropriate suggestion because I don't know if some mechanical pinball components can sound different enough to "matter" when triggered in quick succession versus single-shot triggering.

If it would be helpful I can provide representative UI screenshots etc from various instrumental sampling plug-ins which have evolved over the past couple decades to be quite intuitive yet powerful.

freezy commented 1 year ago

Absolutely!

I'm closing this, because the current discourse is happening in #449. There's also a PR with the first part implemented (#453).

To your points:

  1. The "kit" feature is currently still on TODO (see "sound sets" comment in #449). One way to go about this would be to define a set of tags, which can be assigned to each sound. Additionally, each sound could be assigned to a sound set, or "patch"/"kit", which allow switching out sounds of a given given kit with another one.
  2. Yes round robin is important and is already implemented in the PR mentioned above.

Let me know if you'd be willing to work on this, I'll be soon back on this project again, once dmd-extensions's next version is final.

markran commented 1 year ago

Thanks for the link to #449! I randomly found this thread when Google searching hoping to find a way to avoid manually tweaking a zillion sound files across a hundred tables to get SSF balanced properly on the cab I'm putting together. I'll continue this comment in #449...