Closed VincentJousse closed 7 years ago
One of the things that need to be done is to spec a channel ordering for the Web. It is possible to inspect the internals of an HTMLMediaElement
or MediaStream
with the Web Audio API.
This channel ordering would probably only be used when using the Web Audio API, and needs to be able to work with all existing codecs. If you only use an HTMLMediaElement
without the Web Audio API, another mapping can be used internally.
For now, it looks like some browsers are using the channel mapping of the media stack they are using without remapping. Depending on the OS, OS version, and browsers, results differ.
We need to involve a number of people, including HTMLMediaElement
people, Web Audio API people, VR people (because they have unusual requirements, like non-mapped 8 channel files), and probably authors as well, so that we find a solution that is proper, as multi-channel content is becoming more and more popular.
To match several needs and various kind of contents: multichannel - Ambisonics&HOA - object-based audio... maybe useful to borrow the codes from the fresh ITU-R BS.2076
This document explains the Audio Definition Model, a free format to describe any kind of audio contents, instead of reinventing the wheel. This model has been developed to drive audio rendering engine. The Web Audio API exactly does this job too.
To offer a shortcut and help for this discussion, I would recommend the adoption of audioPackFormat and audioChannelFormat codes to easily describe any multichannel content. These codes are fixed and reflects the most common pack formats (mono, dual mono, stereo, surround, 5.1...) and channel formats (left, right, center, left surround...). It is also possible to take the codes to describe Ambisonics contents and a few others like Ambix used for Google and facebook VR.
In the broadcast domain, to exchange media between editors and broadcasters, we are used to refer to EBU R123 codes that specify a few combinations of packs and channels (type and order). But the main problem with EBU R123 is the necessity to update the recommendation to create new codes, there is no rule to do it. This is why the Audio Definition Model was born: a flexible schema for audio content description. Hope this helps.
Interesting, thanks.
Quoting the document:
Therefore, the EBU plans to generate a set of standard format descriptions for many of the commonly used formats
(section 4, page 7)
This is what we need here. Something like SMPTE 2036-2-2008 [0] would work. As the document you linked notes, having a full-blown meta-model ready for introspection is very heavy, and is not a goal for us.
Most content has a defined channel mapping (whether it's Dolby, SMPTE or somthing like WAVEX, Vorbis, etc.). For us, it's just a matter of presenting something coherent to JavaScript.
For example, say you have a 5.1 file. Regardless of the input format, and if there is a channel mapping defined (unlike in the situation described in the next paragraph), authors should expect to have something like: Left, Right, Center, Low frequency effect, Surround left, Surround right: the User-Agent should re-map the channels accordingly so that it is not necessary to detect which UA the code is running on, and special case every type of file, and let everybody re-map the channels in their JavaScript code (which is doable, of course, but is not something that authors should have to do).
For custom things, it would be better to have something like Opus' mapping family 255 [1], where you don't really have a mapping defined, but your application code can do whatever it needs, and you guaranteed to have the channels in the same order as the order they are stored in in the file.
[0]: The document itself is not freely available, but this is close: https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-base-libs/html/gst-plugins-base-libs-gstaudiochannels.html#GstAudioChannelPosition [1]: http://www.opus-codec.org/docs/opusfile_api-0.4/structOpusHead.html, search for "OpusHead::mapping_family"
This is what we need here. Something like SMPTE 2036-2-2008 [0] would work.
SMPTE 2036-2-2008 defines only the 22.2 setup. Unfortunately, there is currently no document available in SMPTE and ITU-R which could be used as reference for setups with more than 6 channels, such as 9.1 (aka 4+5+0). There could be something from the AES, but I was at least not able to find something. Especially not for all the different setups defined in ITU-R BS.2051. One could use TABLE 1 from the ITU-R BS.2051, but I'm not sure how reliable it is.
For setups up to 6 channels, you should stick with EBU R123, EBU R91 and ITU-R BS.775. However, any reliable order of channels would be very much appreciated.
For custom things, it would be better to have something like Opus' mapping family 255 [1], where you don't really have a mapping defined, but your application code can do whatever it needs, and you guaranteed to have the channels in the same order as the order they are stored in in the file.
I think this would be really great for Ambisonics, HOA and object-based content!
This can be a gradual process. There is a lose de-facto agreement on SMPTE ordering. I propose that we spec that. This covers up to 7.1. Anything else we can spec later.
Although I agree that the AudioWG's review is needed here, but this issue should be upstreamed to the HTMLMediaElement level. If the channel order is nicely defined by the core decoding component s (i.e. video and audio tags), WebAudio can simply follow it.
I believe @jdsmith3000 has been working on this line of work? Any opinion?
There was an effort at some point to layer HTMLMediaElement
on top of the Web Audio API. If this is still something we consider important, then it should be specced the other way around. Of course, conceptually, piping an HTMLMediaElement
into a Web Audio graph then does kind of a weird loop across various specs, but maybe it's a situation that is tenable until the Web Audio API can be used to properly implement an HTMLMediaElement
?
Can you refresh me on the scenario this would support, Paul? Is the point to somehow synchronize audio from multiple media elements?
From: Paul Adenot [mailto:notifications@github.com]
Sent: Monday, December 19, 2016 9:52 AM
To: WebAudio/web-audio-api web-audio-api@noreply.github.com
Cc: Jerry Smith (WPT) jdsmith@microsoft.com; Mention mention@noreply.github.com
Subject: Re: [WebAudio/web-audio-api] Specify the channels order for HTMLMediaElement
(#1089)
There was an effort at some point to layer HTMLMediaElement on top of the Web Audio API. If this is still something we consider important, then it should be specced the other way around. Of course, conceptually, piping an HTMLMediaElement into a Web Audio graph then does kind of a weird loop across various specs, but maybe it's a situation that is tenable until the Web Audio API can be used to properly implement an HTMLMediaElement ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/WebAudio/web-audio-api/issues/1089#issuecomment-268030908, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AH8DH7cCLqhYcSbTpDrhYdpXsFZtMF4uks5rJsQ-gaJpZM4K6o0k.
@jdsmith3000, we have had reports from ISVs that there are inconsistencies between browsers when it comes to channel mapping for a given audio file.
In our reports, authors want to know at which index the channels for, say, Left, Right, Center, or Low Frequency are, so they can be processed appropriately for their use case. The Web Audio API has no concept of channel mapping, and instead assumes what looks like SMPTE ordering for all channels, and up/down mixing. This is becoming important because with the Web Audio API, authors can inspect the output of an HTMLMediaElement
via a MediaElementSourceNode
, or a MediaStream{Track,}AudioSourceNode
if the HTMLMediaElement
's output has been captured via captureStream
.
In practice:
There need to be some sort of agreement on what to do so that multi-channel on the web is viable for the non-basic use-case (simple playback).
Today's WG call: Proposal is to adopt SMPTE ordering. Seeking feedback from other developers and community at large prior to making this change.
Status today: still awaiting feedback from @mdjp colleagues
@jdsmith3000 says that SMPTE ordering appears to be same as Windows ordering.
Resolution is to reproduce SMPTE ordering in the spec.
Some more feedback to consider: SMPTE sounds like a sensible approach. The only other document worth considering is BS.2094 (common definitions for ADM). This specifies pack formats for different layouts (from page 6) and the channel ordering for 22.2 thankfully fits with the SMPTE version. However it also highlights that there’s more than one kind of 7.0/7.1 and for example, do you need a silent fourth channel if there’s no LFE for 5.0. Without having some signalling this generally becomes very difficult. Also what about ambisonics busses.
I'd like to second the suggestion from @mdjp. SMPTE 2036-2-2008 only specifies the channel ordering for a 22.2 setup, which is nice but certainly not a good basis for other, more realistic channel formats for the Web. ITU-R BS.2094 (http://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2094-0-201604-I!!PDF-E.pdf) appears to be the best option for now.
To add a little more color from the last WG call:
We said that we would reproduce the known SMPTE channel ordering in the specification. However, we did not intend to specifically tie the spec to SMPTE (possibly forcing future web developers to buy copies of the spec). What we were actually committing to was to document the known SMPTE orderings in the Web Audio spec as the canonical ordering (without citing SMPTE per se), and leave room for other channel layouts to be included in the future.
So maybe this goes well with the above two suggestions.
This is a little different from my understanding. The SMTPE ordering accommodates up to 22.2 multichannel, but also can accommodate sub-variations. We use a similar ordering scheme for Windows under WaveFormatExtensible. The ordering in it matches SMPTE, but the labels are more explicit. I believe we agreed to both state the ordering and labeling for SMPTE in the spec, and credit the document as the origin.
Like @jdsmith3000, I thought we would reference other documents. I think it makes sense to do so.
Maybe my misunderstanding about referencing SMPTE then -- I defer to spec editors and experts on this. I am not sure if there is actually a problem here.
This is my recollection too (what @padenot and @jdsmith3000 said) from the teleconf. There was some concern expressed about referencing a document that costs a significant amount of money.
I have the SMPTE ST2036-2-2008 UHDTV Audio Char and Channel Mapping specification and am looking at where to specifically add the content and SMPTE reference. To me, the logical change is to extend the mapping information in section 6.2 - Channel Ordering to include "22.2". Subsets should work with that. The SMPTE ordering, for example, lines up with the 5.1 channel ordering currently in the spec. And adding the information here aligns with the original concern about needing ordering for ChannelSplitterNode and ChannelMergerNode.
If this sounds okay to others, I will proceed with a pull request.
Sorry for bothering you again but I still wonder how you will define subsets with less than 24 channels then. For instance, what should be the decoder channel order for an "11.1" (aka "7.1.4" aka "7.1+4") channel setup file? Will there be a definition for this example subset in the spec? What if an unsupported subset is used? Moreover, the increasing usage of Ambisoncs (FOA / HOA) does not even have a direct loudspeaker mapping in the traditional understanding. How can that be covered by the spec?
I also sensed many people are interested in the 'non-diegetic' audio in FOA/HOA scenario.
By the way, even if our WG decides to adopt whatever the mapping scheme is I am not sure what that means. Web Audio API does not govern the decoding/streaming of media file unless it's given to decodeAudioData()
. The splitter and the merger don't do anything smart to reorder the channel mapping and we want them to stay that way. So are we talking specifically about that method? or are we talking about proposing a channel scheme to MediaElement
folks?
I understand there was an attempt to build MediaElement
on top of other APIs, but I don't see that is happening now or in the near future. IMO, we should focus on confirming the decision from MediaElement
, and making our decodeAudioData()
work consistently.
This is about exposing a stable and consistent channel ordering when either:
decodeAudioData
HTMLMediaElement
, via MediaElementAudioSourceNode
This is not the case right now, implementations either remap to a consistent ordering (Gecko, Edge, although I probably haven't checked all cases for Edge), or have the ordering be the ordering of the underlying decoding mechanism (for example, and I could be wrong, but trying a few things, the native API OSX on Safari, whatever ffmpeg
does on Chrome).
Since there is no way (short of writing a custom parser) for authors to discover the actual mapping of a file, the proposal is to remap the channels of all files to a well known mapping. SMPTE/WaveFormatExtensible.
Advanced use-cases can (and do) use something like Opus' mapping family 255, and get from 1 to 255 channels without explicit mapping, but supposedly there is associated code that works with those files, and the knowledge of what to do with what channel is the responsibility of the code (and not the media file).
Again, this only matters because the actual content of the channels is observable when using the Web Audio API (and only using the Web Audio API). When simply playing an HTMLMediaElement
, other behaviour can (and are) implemented and valid, such as, shipping the compressed audio directly a special DSP chip (often with mp3 on mobile for increased battery life) or dedicated hardware (say, encoded surround audio that plays on a Hi-Fi home cinema system, that has a specific channel ordering and processing).
for example, and I could be wrong, but trying a few things, the native API OSX on Safari, whatever ffmpeg does on Chrome
I literally have an example for this.
Yes, I personally experienced this and I understand what the problem is. My point was that the meaning of a decision we make here. Without involving the working group for MediaElement, I don't see no point of the discussion. Perhaps I am asking this because I don't have a big picture on how various working groups are operating.
Yes, see my last paragraph here.
This is not an issue that happens when you're not using the Web Audio API. This remapping would happen in MediaElementAudioSourceNode
or at the end of the decoding when using decodeAudioData
.
Yes, thanks for the clarification.
So the channel mapping between MediaElement and the audio system layer (a DSP chip or a dedicated hardware) is already decided. It sounds like this invisible channel mapping might be different across the platform but it's handled automatically. That means the mapping scheme exposed to Web Audio API also needs to be changed somehow?
That means the mapping scheme exposed to Web Audio API also needs to be changed somehow?
I think the idea so far is, regardless of the actual channel mapping in the actual media, the Web Audio API would re-shuffle the channels to present a stable order.
For example, consider two files, one AAC file and one Wav file.
In AAC, the ordering is: C, L, R, SL, SR, LFE In Wav, the ordering is: L, R, C, LFE, SL, SR
It's no uncommon that the compressed AAC is sent directly to another device (for example using an optical cable), that would be responsible for the matrixing/whatever needs to happen. The other device has the knowledge of the speaker setup AND the channel mapping present in the file, so, it's able to make an informed decision about what to do. The other file also has a defined mapping, so the UA can the uncompressed audio
For the Web Audio API, the information about the channel mapping is lost when using MediaElementAudioSourceNode
or decodeAudioData
, you only have a channel count, so you can't make an informed decision, and re-map the channel yourself. Of course, you can always do that if you provide the info off-band, but it's not great to have to do that.
If we lose channel mapping information, is ordering alone sufficient? The 22.2 mapping will define the order of channel beyond our current mono, stereo and 5.1; but will only hold up if all channels in the order are represented in the audio. Correct?
If we lose channel mapping information, is ordering alone sufficient?
If we spec a consistent channel regardless of the input media, then the Web Audio API implementation remaps to this consistent channel ordering, so authors have the guarantee, for example, that on an AudioBuffer
that has six channels, getChannelData(2)
will be the center channel, so there is no issue about the loss of information.
It appears that ordering and channel count will be sufficient to be able to work with any file, if we have this guarantee that the Web Audio API only presents data in (say) WaveFormatExtensible or SMPTE.
The 22.2 mapping will define the order of channel beyond our current mono, stereo and 5.1; but will only hold up if all channels in the order are represented in the audio. Correct?
I'm sorry, I don't think I understand what you mean here.
for example, that on an AudioBuffer that has six channels, getChannelData(2) will be the center channel, so there is no issue about the loss of information.
I really like that idea. Perhaps we can go further by allowing getChannelData("FC")
?
As we discussed, Media*SourceNode
does not know how many active channels it contains. Can we consider adding one more property there?
console.log(mediaElementSourceNode.streamInfo);
>> "[FL, FR, FC, LFE1, BL, BR]"
Once we can agree upon the channel mapping scheme, adding this should not be a problem.
As we discussed, Media*SourceNode does not know how many active channels it contains.
I don't remember talking about that, do you have a link or something ?
I think I heard it in our teleconference, but couldn't find it on the minute. FWIW, currently the information is not exposed anywhere. (i.e. we can't query the node to find out the current active channel config.)
This kind of introspection is always better for developers, I believe.
The SMPTE channel ordering is currently not supported on Windows. Is there a strong argument favoring SMPTE over ordering established by longer usage in Windows? If not, Iwould prefer listing the ordering in WAVEFORMEXTENSIBLE. It is similar to, but not the same as SMPTE.
Separately, @hoch suggests extending use of standard labeling on the orders. Is the intent to avoid having to fill missed positions with blank channels? I'm not sure implementations support the labeling currently, so this might not be readily supportable today.
An approach that emerged on today's WG call is as follows (note that only @hoch, @rtoy and @svgeesus were present besides the chair):
In https://github.com/WebAudio/web-audio-api/issues/1089#issuecomment-294170564 @hoch suggested that we endow Media*SourceNode
s with some descriptive info that describes the mapping from channel indices. This info could be optionally present, where known. I also note that AudioBuffer
may benefit from the same idea, since its channel indices presumably reflect the channel ordering in whatever media were decoded, and that different media formats use different native channel orderings.
The idea of optional descriptive data (perhaps coupled with some way to easily look up channels by their descriptive meaning, rather than their index) seems in many ways more promising than continuing down the road of forcing a uniform channel ordering on all of these interfaces, which is proving problematic (and might entail incompatible changes to existing WebRTC behavior).
Since this can be added later, and since it will require some more careful design, this suggests we should push off this capability to v.next and allow the channel index assignments to remain as they are for now, still indeterminate in some cases as to what a given channel index really is.
The idea of optional descriptive data (perhaps coupled with some way to easily look up channels by their descriptive meaning, rather than their index) seems in many ways more promising than continuing down the road of forcing a uniform channel ordering on all of these interfaces, which is proving problematic (and might entail incompatible changes to existing WebRTC behavior).
What problems have been encountered ?
@jdsmith3000: From F2F, we've resolved to adopt the ordering from WAVEFORMEXTENSIBLE and to duplicate this information in the spec. An informative note can observe that the spec ordering is based on WAV.
Like this?
Extended:
0: SPEAKER_FRONT_LEFT 1: SPEAKER_FRONT_RIGHT 2: SPEAKER_FRONT_CENTER 3: SPEAKER_LOW_FREQUENCY 4: SPEAKER_BACK_LEFT 5: SPEAKER_BACK_RIGHT 6: SPEAKER_FRONT_LEFT_OF_CENTER 7: SPEAKER_FRONT_RIGHT_OF_CENTER 8: SPEAKER_BACK_CENTER 9: SPEAKER_SIDE_LEFT 10: SPEAKER_SIDE_RIGHT 11: SPEAKER_TOP_CENTER 12: SPEAKER_TOP_FRONT_LEFT 13: SPEAKER_TOP_FRONT_CENTER 14: SPEAKER_TOP_FRONT_RIGHT 15: SPEAKER_TOP_BACK_LEFT 16: SPEAKER_TOP_BACK_CENTER 17: SPEAKER_TOP_BACK_RIGHT
Yes. We'd make a table in the spec that has a informations.
We can use this table as a sort ordering for channels, so that the description copes gracefully with missing channels.
PR #1271 should resolve this issue.
As discussed here, we need a normalized channels order for channels manipulation in the Web Audio API. It has already been defined for upmix/downmix algorithms. The only places where I see a lack of specification are ChannelSplitterNode and ChannelMergerNode. I see two ways to specify this :
I'm not an expert user of the WAA, so I let others feed this thread.
Vincent