MPEGGroup / FileFormat

MPEG file format discussions
23 stars 0 forks source link

ISOBMFF unknown/dynamic audio channel configuration #13

Closed ihofmann-iis closed 4 years ago

ihofmann-iis commented 4 years ago

In the Semantics of section 12.2.3.3. we define the channelCount to be

the number of channels such as 1 (mono) or 2 (stereo) or 0 if inapplicable/unknown;

which is an accurate definition even for modern systems, where the number of channels is dynamic. However, in 12.2.3.1. we are less specific.

Please consider aligning the wording of both subsections to include dynamic channel configurations.

BR, Ingo

dwsinger commented 4 years ago

12.2.3.1 says

When channelcount is a value greater than zero, it indicates the intended number of loudspeaker channels in the audio stream. A channelcount of 1 indicates mono audio, and 2 indicates stereo (left/right). When values greater than 2 are used, the codec configuration should identify the channel assignment.

I think we should delete 'loudspeaker' from here — it doesn't apply to binaural or encoded ambisonics. Nor do I know what 'intended' means.

I propose that we move the remaining part to 12.2.3.3.

12.2.3.1: When channelcount is a value greater than zero, it indicates the number of channels in the audio stream.

12.2.3.3: 0 — inapplicable/unknown 1 — mono 2 — stereo (left/right) all other values — the codec configuration should identify the channel assignment.

ihofmann-iis commented 4 years ago

12.2.3.1 says

When channelcount is a value greater than zero, it indicates the intended number of loudspeaker channels in the audio stream. A channelcount of 1 indicates mono audio, and 2 indicates stereo (left/right). When values greater than 2 are used, the codec configuration should identify the channel assignment.

I think we should delete 'loudspeaker' from here — it doesn't apply to binaural or encoded ambisonics. Nor do I know what 'intended' means.

Ok

I propose that we move the remaining part to 12.2.3.3.

12.2.3.1: When channelcount is a value greater than zero, it indicates the number of channels in the audio stream.

12.2.3.3: 0 — inapplicable/unknown 1 — mono 2 — stereo (left/right) all other values — the codec configuration should identify the channel assignment.

We should probably still spell out that the value is the total number of audio channels (or any term more accurate). Would we want to be more specific what "the codec configuration" would be? Is it safe to assume that it's always the Decoder Specific Info?

dwsinger commented 4 years ago

I propose that we move the remaining part to 12.2.3.3. 12.2.3.1: When channelcount is a value greater than zero, it indicates the number of channels in the audio stream. 12.2.3.3: 0 — inapplicable/unknown 1 — mono 2 — stereo (left/right) all other values — the codec configuration should identify the channel assignment.

We should probably still spell out that the value is the total number of audio channels (or any term more accurate). Would we want to be more specific what "the codec configuration" would be? Is it safe to assume that it's always the Decoder Specific Info?

Um, doesn't the proposed 12.2.3.1 say that? "it indicates the number of channels in the audio stream." Should we say "total number of channels" (does that add anything?).

Some codecs use MPEG-4 DecoderSpecificInfo, others use sample entry boxes. some might put the config instream at sync points. it's hard to find a more precise phrase...but we can try...

ihofmann-iis commented 4 years ago

I propose that we move the remaining part to 12.2.3.3. 12.2.3.1: When channelcount is a value greater than zero, it indicates the number of channels in the audio stream. 12.2.3.3: 0 — inapplicable/unknown 1 — mono 2 — stereo (left/right) all other values — the codec configuration should identify the channel assignment.

We should probably still spell out that the value is the total number of audio channels (or any term more accurate). Would we want to be more specific what "the codec configuration" would be? Is it safe to assume that it's always the Decoder Specific Info?

Um, doesn't the proposed 12.2.3.1 say that? "it indicates the number of channels in the audio stream." Should we say "total number of channels" (does that add anything?).

It does (overlooked)! Works for me.

Some codecs use MPEG-4 DecoderSpecificInfo, others use sample entry boxes. some might put the config instream at sync points. it's hard to find a more precise phrase...but we can try...

Add an informal sentence? "..., such as DecoderSpecificInfo" Not sure if this helps or makes it worse.

Frank-Ba commented 4 years ago

Related issues in clause 12.2.3.1 are (quotes):

  1. "Similarly, an AudioSampleEntryV1 should be used when channelcount is other than 2 and a ChannelLayout is also present."
  2. In the syntax of AudioSampleEntry: "template unsigned int(16) channelcount = 2"

Recommendation (1) seems no longer necessary if the ‘chnl’ version 1 box is used, because it carries its own base channel count. Further, we think that the use of AudioSampleEntryV1 is a greater interoperability challenge than allowing the correct channel count to be written to AudioSampleEntry. In most legacy cases that channel count will be overridden by the contents of the audio codec configuration info, just as the sample rate is overridden. If interoperability concerns arise for legacy cases, the use of the channel count in AudioSampleEntry as an authoritative channel count can be reserved for more recently specified cases in which there is no other channel count available. In particular this could apply to mono PCM, for which no ‘chnl’ box is required, and also to PCM with a ‘chnl’ version 0 box. There is some confusion about the meaning of the “= 2” in (2). A common interpretation is that 2 is the default value.

Proposed change For clarification, we propose: • to remove the sentence (1), which appears to be obsolete. • To remove the “= 2” from (2), which results in: "template unsigned int(16) channelcount"

Regards, Frank and Kevin

cconcolato commented 4 years ago

If "=2" is removed, the word "template" should be removed as well.

I think it's also worth clarifying some sentences:

The samplerate, samplesize and channelcount fields document the default audio output playback format for this media.

What does "default" mean in this context?

Also the paragraph starting with

When it is desired to indicate an audio sampling rate greater than the value than can be represented in the samplerate field, ...

is quite confusing. I'm not sure I understand why case 1) means.

Frank-Ba commented 4 years ago

I'm not sure I understand why case 1) means. Case (1) recommends to not use AudioSampleEntry in the described situation and to use AudioSampleEntryV1 instead.

dwsinger commented 4 years ago

mostly or completely addressed as editorial actions in the revised 6th edition