Mono audio plays on only the left speaker if a ChannelSplitterNode is used

WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG

https://webaudio.github.io/web-audio-api/

Other

1.04k stars 165 forks source link

Mono audio plays on only the left speaker if a ChannelSplitterNode is used #2570

Open Andrews54757 opened 4 months ago

Andrews54757 commented 4 months ago

The Problem

The ChannelSplitterNode uses index 0 (SPEAKER_FRONT_LEFT) for mono audio. This causes the audio to play on the left speaker only. To ensure that mono audio plays on both speakers, the audio should be sent to the channel index corresponding to SPEAKER_FRONT_CENTER instead.

Adjusting this behavior programmatically is difficult because the API doesn't provide information on the number of channels the audio source provides. The spatial audio format must be known to programmatically connect splitter's channel 0 to merger's channel 2 for mono audio only while keeping the usual order for other formats.

Potential Solutions

Implement a method to access information about the current set of mappings being used by the splitter
Implement a node that upmixes/downmixes channels based on user-supplied conditions.

Workaround

Using the ability of the Panner and StereoPanner nodes to force stereo output regardless of the input, one can construct a network to achieve the following:

If the audio source is mono, upmix to output mono audio on both the Left and Right channels.
If the audio source is stereo, output stereo audio to both the Left and Right channels.
If the audio source contains more than 2 channels, additional channels are passed through unaltered.

diagram of network

The resulting output can then be processed using ChannelSplitterNodes with proper behavior for both mono and stereo audio.

Limitations

As the Panner nodes have a maximum of 2 channels, this technique will not work for fixing audio issues caused by the differences between Quad and 5.1 formats.

mjwilson-google commented 3 months ago

Teleconference 2024-03-07 notes:

For potential solution 1: the specification currently has channel mapping as a property applied to node inputs. Adding channel mappings for outputs would be a significant change to the specification, probably too big.

For potential solution 2: this is one of the use cases of the ChannelSpliiter and ChannelMerger nodes. AudioWorklet should also be able to do this.

This behavior is due to the ChannelSplitter always having ChannelInterpretation "discrete", and the ChannelMerger always having ChannelInterpretation "speakers" -- this was done intentionally and they were designed to be used together.

Regarding "Adjusting this behavior programmatically is difficult": is there a specific use case that we're missing where it's not possible to set up the ChannelSplitter and ChannelMerger in advance? Or a situation where the channel mixing rules are not enough?

Andrews54757 commented 3 months ago

I must be able to apply different filter nodes to specific audio channels after mixing occurs. This is not currently possible as there is no way to know how many channels the audio has.

mjwilson-google commented 3 months ago

I think I'm missing something necessary to understand the problem.

If your program is doing the mixing, then it should know how many channels are in the result. You should be able to use ChannelSplitterNode, apply the filters, and then merge back to the mixed format with ChannelMergerNode.

What is the audio source in your program? AudioBuffer, for instance, has a numberOfChannels attribute; is there another audio source in Web Audio that doesn't have any channel information?

Or, is the problem only on the output side? I think the audio playing out of the left speaker only is due to the 'discrete' channelInterpretation from ChannelSplitterNode, so as long as you use a ChannelMergerNode after it, or any other node that can be set to 'speakers', the upmixing rules should apply.

mjwilson-google commented 3 months ago

Also, thank you for the diagrams and detailed explanation. I can see you've spent some time on this, so I want to make sure I understand the issue fully.

Andrews54757 commented 3 months ago

I am making a video player extension. The audio source is an arbitrary <video> or <audio> element. The user provides the media to play. The source is created with a call to createMediaElementSource. There is no attribute on the resulting source to obtain the number of channels in the media. (channelCount is always 2)

mjwilson-google commented 3 months ago

Thank you, I think I understand the problem now.

The spec for MediaElementAudioSourceNode says: "The number of channels of the output corresponds to the number of channels of the media referenced by the HTMLMediaElement. Thus, changes to the media element’s src attribute can change the number of channels output by this node."

and later: "The number of channels of the single output equals the number of channels of the audio referenced by the HTMLMediaElement passed in as the argument to createMediaElementSource(), or is 1 if the HTMLMediaElement has no audio."

However, I don't see a straightforward way to get the number of channels from the HTMLMediaElement either.

channelCount is used for inputs to a node, so not relevant here.

I think this needs further discussion by the working group:

One proposal is to add a numberOfChannels attribute to the MediaElementAudioSourceNode.
The spec seems to imply that the number of channels might change while the MediaElementAudioSourceNode is active; if that is possible then we need a mechanism to notify the program when this happens.

Are there any other nodes or situations you have found where the channel count was unknown? Or would adding information to MediaElementAudioSourceNode resolve everything?

Andrews54757 commented 3 months ago

Thanks you for looking into this issue. I believe that adding a numberOfChannels attribute to the MediaElementAudioSourceNode will resolve the issues I am having for my use case.

padenot commented 3 months ago

This is possible today, but requires some code. It handles the change in channel count nicely. Here's a stand-alone program that does it (more or less in code what @mjwilson-google said in english above)

<button>
  Start
</button>
<input type="file" accept="audio/*" id="audioFilePicker" />
<audio controls id=a></audio>
<span id=result></span>
<script type="worklet">
registerProcessor('channel-counter', class param extends AudioWorkletProcessor {
  constructor() {
    super()
    this.inputChannelCount = 0;
  }
  process(input, output, parameters) {
    if (input && input[0].length != this.inputChannelCount) {
      this.inputChannelCount = input[0].length;
      this.port.postMessage(input[0].length)
    }
    return true;
  }
});
</script>
<script>
var ac = new AudioContext;
var e = document.querySelector("script[type=worklet]")
var text = e.innerText;
const blob = new Blob([text], {type: "application/javascript"});
var url = URL.createObjectURL(blob);

ac.audioWorklet.addModule(url).then(() => {
  counter = new AudioWorkletNode(ac, 'channel-counter');
  counter.port.onmessage = function(e) {
    result.innerHTML = `Audio has ${e.data} audio channels`;
  }
});

const fileInput = document.getElementById("audioFilePicker");

fileInput.onchange = function() {
  if (fileInput.files && fileInput.files[0]) {
    const reader = new FileReader();
    reader.onload = function(event) {
      a.src = event.target.result;
      a.controls = true;
      a.play();
      var source = ac.createMediaElementSource(a);
      source.connect(counter);
      source.connect(ac.destination);
    };

    reader.readAsDataURL(fileInput.files[0]);
  }
}
document.querySelector("button").onclick = function() {
  ac.resume();
}
</script>

Fiorello commented 3 months ago

Hi, i have exactly the same problem. I have to create a channel selector that works for mono and stereo file. If i use stereo panner node, i can't separate the two channels. Just heard each channels on one speaker at time. So i decide to use a splitter node connected whith two gainNode, to control the volume of the two channels separately.

For stereo files, it works, but mono file not. Only one of the two speakers can be heard. So I decided to switch to a different approach based on the number of channels, but it's always 2 so i can't. Some example code:

Actual code

const audio= new Audio(url)
const audioContext = new AudioContext()
const source = audioContext.createMediaElementSource(audio)
const splitter = audioContext.createChannelSplitter()
const merger = audioContext.createChannelMerger()
const leftGainNode = audioContext.createGain()
const rightGainNode = audioContext.createGain()

source.connect(splitter)
splitter.connect(leftGainNode, 0)
splitter.connect(rightGainNode, 1)
leftGainNode.connect(merger, 0, 0)
rightGainNode.connect(merger, 0, 1)
merger.connect(audioContext.destination)

Channels based approch (not working):

const audio= new Audio(url)
const audioContext = new AudioContext()
const source = audioContext.createMediaElementSource(audio)

if (source.channelCounts > 1) { // it's always 2
  const splitter = audioContext.createChannelSplitter()
  const merger = audioContext.createChannelMerger()
  const leftGainNode = audioContext.createGain()
  const rightGainNode = audioContext.createGain()

  source.connect(splitter)
  splitter.connect(leftGainNode, 0)
  splitter.connect(rightGainNode, 1)
  leftGainNode.connect(merger, 0, 0)
  rightGainNode.connect(merger, 0, 1)
  merger.connect(audioContext.destination)
} else {
  const stereoPanner = audioContext.createStereoPanner()
  source.connect(stereoPanner)
  stereoPanner.connect(audioContext.destination)
}

hoch commented 2 months ago

We can consider exposing a 'numberOfChannels' property out of MediaStreamTrackAudioSourceNode.

This comes with a caveat: the number of channels of the MediaStreamTrack can change dynamically and the inspection of this value from the main thread will always be (slightly) stale. We can consider adding an event for the change - not ideal but acceptable for the majority use cases.

@padenot's approach above will be the only sample-accurate way to figure it out. It's rather heavy-weight, but it's correct and doesn't need a new API.