Closed cconcolato closed 1 year ago
It would be better to take this action after most of the spec texts are cleared.
I do want to leave this open just because although most of this has been fixed by #95, I do think the introduction is a bit abrupt now and could use maybe one more pass.
Here are some high level suggestions to improve the introduction:
My main concern is that the introduction explains the "how" (model, syntax), without giving an indication of the "why". It should contain examples of how to instantiate the model for typical IAMF use cases, from the most simple use case (1 Mix Presentation, 1 Audio Element, 1 or N substreams to achieve speaker layouts that are badly addressed today) to most complex examples.
The specification should start with the introduction rather than with the conventions. The reader is more interested in understanding what the spec is about rather than what conventions were used. I would move the convention section to the end of the spec, maybe with a sentence in the current section 3 saying that conventions are described in section X, providing a hyperlink to it.
Some more detailed comments below: Before Section 2.1
[ ] "The immersive audio means" This seems to be missing the word "term": -> "The term immersive audio means"
[ ] The term "3D audio signals" should be defined. Also is it plural or singular? A sentence below uses "For a given input 3D audio".
[ ] "based on coded audio substreams" -> "based on coded audio bitstreams, called 'substreams'"
[ ] Clarify in the paragraph above Figure 1 if the 2 sets of "physical loudspeakers" have the same configuration. Maybe update the figure to say "... loudspeakers (configuration 1)" and ".. loudspeaker (configuration 2)?
[ ] The terms in the bulleted list below Figure 2 should define the terms or link to where these terms are defined: "Pre-Processor", "Pre-Processed Audio", "Codec Agnostic Metadata", "Audio Codec Enc", "Codec-Dependent Bitstream" "coded substreams", "Bitstream Packager", "IA sequence", "File Packager", "IAMF File" "File Parser", "Bitstream Parser", "Audio Codec Dec", "Post-Processor", "Pre-Processed Audio"
[ ] Figure 1 is really about illustrating the model which is defined in section 2.1. Consider moving Figure 1 to section 2.1.
Section 2.1
[ ] "The IA sequence is a bitstream" -> "An IA sequence is a bitstream".
[ ] "An IA sequence is a bitstream": it is confusing. Is an "IA Sequence" a concrete object or an abstraction/logical concept.
[ ] "dynamic streaming" is confusing. What is "dynamic"? Delete "dynamic" or define it.
[ ] "The bitstream comprises a number of coded audio substreams": The use of the term "bitstream" is confusing. Does it refer to the "Codec-Dependent Bitstream"?
[ ] "The bitstream format itself is codec-agnostic": This is confusing as usual a "bitstream" as a concrete syntax and is not codec-agnostic.
[ ] "Audio substream is the actual audio signal" -> "An audio substream is an actual audio signal"
[ ] "Audio element is the 3D representation of the audio signals": It is confusing because the "IA Sequence" is already "the combination of 3D audio signals". Maybe use "signal" singular?
[ ] "Parameters" is a very generic term. Could you prefix it? Maybe "IA Parameter"?
Section 2.2.1
Section 2.2.2
@cconcolato, thank you for your wonderful suggestions to improve its readability. Let me take this action after resolving technical issues which are related to Ref. S/W implementation. Anyway, the shape would be like this.
The introduction could be improved:
I suggest restructuring as follows:
I don't think we need to list the rest of the specification. This is error prone and the specification already contains a table of content on the side.
As a result, section 1 and 2 should be merged and only 1 section "Introduction" should be used.