Mach1Studios / m1-spatialaudioserver

Backend for serving custom streaming spatial audio players and includes a frontend web client example
3 stars 1 forks source link

Binaural rendering #2

Open Avnerus opened 2 years ago

Avnerus commented 2 years ago

Hi again! I was wondering if binaural rendering is supported when playing the Mach1 format, if it can be added on top of the basic layer, or is it natively supported? Admittedly I do not know much about the sound theory, but I noticed that in Omnitone they use convolvers to achieve that effect. Thank you! /Avner

himwho commented 2 years ago

@Avnerus yeah we have everything designed and exposed in our APIs so that anyone is free to add additional processing at any step themselves, we wanted to focus on having a framework that promotes openness for handling multichannel audio and teaches a transparent baseline and let each use case dictate the additional processing (room modeling, HRTF-filtering, etc) themselves instead of forcing our proprietary processing on an entire pipeline.

if it can be added on top of the basic layer, or is it natively supported?

yeah the easiest and most common design would be to add a binaural HRTF processor might be adding it as a stereo processor after the stereo output is applied from Mach1Decode but using the same orientation input that Mach1Decode would get. However no reason to not explore adding multichannel processing to one of our vector formats directly or use Mach1Transcode to convert from other common spatial audio approaches such as ambisonics.

Just a reminder though, omnitone and other approaches that add processing like this ultimately add another layer of abstraction that in some cases can cause issues for the production/creation side. We wanted to show that you can create and cater to high quality spatial audio soundfields that can include headtracking playback without these additional spatial audio processing techniques as a baseline framework, so while we don't offer it directly in Mach1 Spatial SDK, its designed to support it at any step.

Avnerus commented 2 years ago

Thank you again @himwho for the detailed response. Do you have any recommendation on how a stereo binaural processor might be implemented on top of Mach1with Web Audio? I was thinking if it would be possible to connect the merger to another PannerNode working in HRTF mode. Then there is the option of changing the listener orientation, but I think that might interfere with the existing panners? Otherwise it's possible to set the position of the HRTF panner based on the listener orientation, It seems to create an effect, but I haven't been able to get a satisfying result so far. Thank you!

/Avner

himwho commented 2 years ago

in abstract terms i would recommend first completing the ingestion of the input spatial soundfield -> Mach1Decode -> stereo, then apply any additional stereo binaural processing effects to that stereo output (driven with the same orientation used to Mach1Decode) to keep a clear chain in the signal flow and make it easier to control/adapt any steps as needed in the future.

INPUT MULTICHANNEL -> MACH1TRANSCODE -> MACH1DECODE -> Orientation Tracking Stereo -> Additional Processing Effects -> Output Stereo

of course you can combine this into a shorter hand implementation web wise but I would recommend keeping things in steps so you can add flow control to each stage as needed and to remain flexible.

Let me make sure we comment out example a little more as it is still premature and maybe not clear and verbose as to where these steps should go compared to our other examples which sort of describe things as:

himwho commented 2 years ago

if you use HRTF processing from the PannerNode during the Mach1Decode steps you will likely color the mixes in very unintended ways, i would not recommend it but i havent played with that too much, i would recommend adding more conventional HRTF stereo->stereo processing at the end of the chain that just expects orientation data to drive that stereo->stereo processing to create a signal chain that is MUCH safer for post-production.

opinion disclaimer: There are many hrtf stereo->stereo github repos but we have yet to really do a quality exploration of them to recommend any specific (although happy to be involved in testing any if you are interested). The tricky thing is they are designed to alter a mix to enhance certain perceptive qualities but usually that quality depends on the measurement data for the HRTF being tailored per person, more cases that not this will just lessen audio quality to try to achieve a headtracking effect we already achieve via Mach1Decode; and how the input spatial audio mix streaming into it is designed without runtime usage of additional processing effects.

Avnerus commented 2 years ago

Thank you @himwho for the response, It's an interesting challenge to figure how binaural filters could be applied to the decoded Mach1 stereo input. In the common web-based HRTF/HRIR filters that I found such as from IrCam and 3DTI, it seems that the processors are designed to work on mono audio buffers. It seems that stereo->stereo HRTF should involve some kind of interpolation between the right HRIR and left HRIR? I found some example here and also in the implementation of PannerNode. Would appreciate if you could list some of your sources for those kind of stereo->stereo filters. If they are not for Web it might still be possible to expose them via wasm. Thank you!

himwho commented 2 years ago

@Avnerus with further discussions on our side; you might be correct that when we see examples that showcase HRTF stereo->stereo processing that are really just abstracted 2x mono->mono examples targeting ear positions.

Avnerus commented 2 years ago

@himwho Thanks for the update. On that note then, might it make sense to run 8x HRTF functions on the Mach1 channels, providing to the processor the listener orientation as well as the virtual position of the signal in the octahedron?

himwho commented 2 years ago

@Avnerus i personally would still recommend using a stereo -> stereo approach as seen in other similar pipelines, the 8x HRTF function makes more sense if you were trying to remodel room acoustics for an 8x loudspeaker array and change the room model for that, in this case people have converted or mixed to the 8x virtual vector format configuration as a mixing pipeline in which they would have already pre-rendered how the mix sounds in terms of "reverb", so taking a soundfield mix and trying to add HRTF to the mix by simulating what it would sound like in a space is not advised in most cases (but still very possible), and instead focusing on adding HRTF to explore potentially better simulations of additional head related transfer functions to the stereo output is maybe a safer way to handle this use case. If the 8x Mach1 channels were part of an automated pipeline without supervised mixing or post-production then I would say the 8x HRTF functions are more valid and interesting.

All that said this is relatively unexplored so if you want to test any concepts out we are happy to help give feedback or discuss the concepts and get back to you with more "pros and cons".