WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.05k stars 167 forks source link

Describe "Planar versus interleaved buffers" #2225

Closed guest271314 closed 4 years ago

guest271314 commented 4 years ago

Describe the issue

The specification does not include the term "planar". MDN does include the term "planar" relevant to Web Audio API https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Basic_concepts_behind_Web_Audio_API#Planar_versus_interleaved_buffers

Planar versus interleaved buffers The Web Audio API uses a planar buffer format. The left and right channels are stored like this:

LLLLLLLLLLLLLLLLRRRRRRRRRRRRRRRR (for a buffer of 16 frames) This is very common in audio processing: it makes it easy to process each channel independently.

The alternative is to use an interleaved buffer format:

LRLRLRLRLRLRLRLRLRLRLRLRLRLRLRLR (for a buffer of 16 frames) This format is very common for storing and playing back audio without much processing, for example a decoded MP3 stream.

The Web Audio API exposes only planar buffers, because it's made for processing. It works with planar, but converts the audio to interleaved when it is sent to the sound card for playback. Conversely, when an MP3 is decoded, it starts off in interleaved format, but is converted to planar for processing.

Where Is It

Missing from the specification.

Additional Information

Does encoding raw PCM "Planar versus interleaved buffers", one or the other, impact AudioWorkletProcessor output?

rtoy commented 4 years ago

Why is this needed? There is no requirement that the browser stores data in this planar (or interleaved) format. The existing APIs explain how you get and/or store audio data in, say, an AudioBuffer, or the data from an AnalyserNode. How it's actually stored is an internal implementation detail.

guest271314 commented 4 years ago

Why is this needed?

Thorough technical details as to what is actually occuring on with regard to Web Audio API specification.

If we construct two Float32Arrays by hand, not an AudioBuffer and set L channel and R channel as floating point numbers in each typed array

LLLLLLLLLLLLLLLLRRRRRRRRRRRRRRRR

and use process() of AudioWorklet, will the output be the same as setting the values

LRLRLRLRLRLRLRLRLRLRLRLRLRLRLRLR (for a buffer of 16 frames)

be the same as the former?

How it's actually stored is an internal implementation detail.

Does that mean that either option above can be used at any arbitrary implementation and yield the "same" audio output result?

The specific code that gave rise to this question is https://stackoverflow.com/a/35248852


// This is passed in an unsigned 16-bit integer array. It is converted to a 32-bit float array.
// The first startIndex items are skipped, and only 'length' number of items is converted.
function int16ToFloat32(inputArray, startIndex, length) {
    var output = new Float32Array(inputArray.length-startIndex);
    for (var i = startIndex; i < length; i++) {
        var int = inputArray[i];
        // If the high bit is on, then it is a negative number, and actually counts backwards.
        var float = (int >= 0x8000) ? -(0x10000 - int) / 0x8000 : int / 0x7FFF;
        output[i] = float;
    }
    return output;
}

where in the code there is only one channel of output.

When converting a WAV file (streaming from fetch() with Content-Length that crashes decodeAudioData(), which is not necessary or useful in this case) to Float32Array using that code, when more than one channel is encoded the current code that am using based on the above matches description of "interleaved"

function int16ToFloat32(inputArray) {
    let ch0 = [];
    let ch1 = [];
    for (let i = 0; i < inputArray.length; i++) {
      const int = inputArray[i];
      // If the high bit is on, then it is a negative number, and actually counts backwards.
      const float = (int >= 0x8000) ? -(0x10000 - int) / 0x8000 : int / 0x7FFF;
      // toggle setting data to channels 0, 1
      if (i % 2 === 0) {
        ch0.push(float);
      } else {
        ch1.push(float);
      }
    };
    return {
      ch0, ch1
    };
}

Is the result of that code consistent with an Web Audio API AudioBuffer from decodeAudioData() for a two-channel WAV file? Or should the code produce two Float32Arrays for such a WAV

LLLLLLLLLLLLLLLLRRRRRRRRRRRRRRRR

as described by the MDN article?

Is the information accurate relevant to either ordering and based on

How it's actually stored is an internal implementation detail.

it does not matter, either option can be used in an implementation agnostic manner and the same result with be output?

rtoy commented 4 years ago

I think AudioBuffer.getChannelData pretty much explains it, at least for an AudioBuffer. The data are accessed in a planar fashion.

And process() for an AudioWorklet uses an API with planar data too. Each Float32Array is one channel. You can't put LLL...LLRRR...RRR into the array and expect an AudioWorklet to produce the right results. You have to use 2 arrays, one containing L values and the other containing R values. I think that's pretty clear from the description of the input and output parameters for the process method.

How the data is stored or processed in any other place is not visible so implementations are free to do whatever they want.

guest271314 commented 4 years ago

@rtoy If gather your post correctly even if the data is stored interleaved in discrete Float32Arrays the read of that data to produce audio output internally is converted to planar? How would that work for 4 channels from a WAV file manually parsed into Float32Arrays?

What section of the specification describe the conversion of interleaved to planar?

rtoy commented 4 years ago

There is no section to describe that because it's all internal. However, chrome does, in fact, use planar for everything. Technically it doesn't have to and there's no way for you to tell.

As an example, let's say you've created an AudioBuffer with four channels. This is, of course, planar. Now create an AudioBufferSourceNode with that buffer. Internally, this could copy out the data and convert it to interleaved in some hidden internal buffers. Connect this to a bunch of downstream nodes and the output. There's no way to know that this was done. And placing an AudioWorklet in the graph just means the interleaved data is deinterleaved to planar for you and vice versa.

Yes, this is all rather wasteful in memory and CPU, but an implementation could do that if desired.

I don't see any need to describe this. The API that exposes planar data is properly described. Internals are internal, and you can't see it so it doesn't need to be described in the spec.

guest271314 commented 4 years ago

It is interesting that "Planar versus interleaved buffers" is described at MDN, yet not at the specification.

Your general point revelvant to the specification appears to be that what am requesting technical clarification for is moot because from your perspective the implementation of Web Audio API is a black box that is not observable.

More information is available about the Web Audio API with regard to planar and interleaved at a source other than the specification itself.

If

Yes, this is all rather wasteful in memory and CPU

is true and correct, then technically

There's no way to know that this was done.

cannot be true and correct at the ame time. As if a different method was used by the internal implementation then the user could observe the difference; less waste of memory and CPU is certainly observable as to performance of the device itself. So, that means you can "see it" right now, if you actually look and ask. However, if the question is never asked, or cloaked in by a "internal implementation" veil that users in the field are simply expected to accept without question at all, then yes, you "can't see it".

Here, do not simply accept the surface layer of explanations for anything.

As an example, would have never reached the point of isolating why Chromium consistently crashed when variable width and height frames were played back if had just stopped experimenting, testing and asking questions due to the video decoder/encoder being an "internal implementation". The same is true for other discliplines that have been and am engaged in, particularly history and science.

The "internal implementation" is "not observable" would-be barrier to further analysis does not stop research here. If anything, when such assertions of non-observability are made the first step in the process here is to verify if that is true.

Yes, if conversion to and from planar to interleaved is expensive at Chrome, as you indicated, then by necessity such internal implementation is observable: Your post describing "wasteful" proactices is an observation of the internal implementation that you must be able to "see" in order to describe in such detail. That detail needs to be explained for all who use the Web Audio API.

rtoy commented 4 years ago

Yes, memory and CPU usage is noticeable. But not the audio produced. By "observable" I meant you can't use webaudio and JS to tell how things are implemented internally.

You can have a really wasteful ConvolverNode that uses lots of memory and CPU. Or perhaps one that uses little memory and lots of CPU or more memory with less CPU. The spec doesn't care. It's up to the implementation to do the tradeoff that is appropriate for the implementation. In any case, the output is the same (ignoring floating-point round-off issues).

If a browser wants to be wasteful, more power to them. But I think the API is clear: planar for AudioBuffer, AudioWorklet, ScirptProcessor. We don't need to describe whether planar or interleaved or a mix is used anywhere else.

The spec is a description of what things do, not how they're implemented internally except as constrained by what is supposed to be produced.

Crashes are bugs in the implementation, not necessarily in the spec. (Although sometimes they are because the spec was incorrect.)

guest271314 commented 4 years ago

By "observable" I meant you can't use webaudio and JS to tell how things are implemented internally.

Yes, you can https://bugs.chromium.org/p/chromium/issues/detail?id=1001948, et al. This entire branch https://github.com/guest271314/MediaFragmentRecorder/tree/chromium_crashes is dedicated to doing just that.

Consider https://github.com/padenot/ringbuf.js/blob/master/js/audioqueue.js

// Interleaved -> Planar audio buffer conversion
//
// `input` is an array of n*128 frames arrays, interleaved, where n is the
// channel count.
// output is an array of 128-frames arrays.
//
// This is useful to get data from a codec, the network, or anything that is
// interleaved, into planar format, for example a Web Audio API AudioBuffer or
// the output parameter of an AudioWorkletProcessor.
export function deinterleave(input, output) {
  var channel_count = input.length / 256;
  if (output.length != channel_count) {
    throw "not enough space in output arrays";
  }
  for (var i = 0; i < channelCount; i++) {
    let out_channel = output[i];
    let interleaved_idx = i;
    for (var j = 0; j < 128; ++j) {
      out_channel[j] = input[interleaved_idx];
      interleaved_idx += channel_count;
    }
  }
}
// Planar -> Interleaved audio buffer conversion
//
// Input is an array of `n` 128 frames Float32Array that hold the audio data.
// output is a Float32Array that is n*128 elements long. This function is useful
// to get data from the Web Audio API (that does planar audio), into something
// that codec or network streaming library expect.
export function interleave(input, output) {
  if (input.length * 128 != output.length) {
    throw "input and output of incompatible sizes";
  }
  var out_idx = 0;
  for (var i = 0; i < 128; i++) {
    for (var channel = 0; j < output.length; j++) {
      output[out_idx] = input[channel][i];
      out_idx++;
    }
  }
}

This should be in the specification. At least describe the difference, particularly relevant to cost of CPU and memory. If Web Audio API specification is the authoritative source for Web Audio implementations then description of the impacts of using a specific approach to implement the specification should be described.

Crashes are bugs in the implementation, not necessarily in the spec. (Although sometimes they are because the spec was incorrect.)

Again, in the case of variable width and height recording and playback if had simply stopped asking questions would never had isolated why this code https://github.com/guest271314/MediaFragmentRecorder/blob/master/MediaFragmentRecorder.html consistently crashed Chromium for years.

In the case of Picture-In-Picture window specification https://github.com/w3c/picture-in-picture/pull/186, the mere recommendation to restrict PiP window size, that Chromium implements https://bugs.chromium.org/p/chromium/issues/detail?id=937859, ironically provides a vector for "fingerprinting" the user screen - while never actually stating why the recommendation is in the specification in the first place.

When Web Audio API specification contributors write code to convert to and from planar to interleaved in their own repositories, the specific subject matter is not insignificant. Yet, in order to even be abreast of that subject matter a user in the field cannot learn about that technical difference in the controlling specification document, rather as it stands, learns from MDN, and reading code at large. That is an obvious omission that can be fixed by simply explaining the difference between the two. In this case the term "interleaved" does not appear once in the specification, yet is undeniably a consideration re conversion to and from planar and interleaved, as you indicated, such conversion can be expensive, observable at the device itself.

Why should users have to rely on MDN to describe what Web Audio API actually does - instead of the primary source document?

You and your colleagues that contribute to the specification are the experts in this domain. Am asking the experts for clarification and detailed description of the difference between the two data structures - to be included in the specification, as that document is the primary source. Am not sure why there is any objection to that. If the subject matter was insignificant specification authors would not be writing code to perform the conversion; MDN would not have included the description in their article; you would not have mentioned that such conversion could be wasteful.

This is a reasonable request for more information. At least a brief write-up, or non-normative note describing the difference between the two, that implementers are naturally free to store the data in any manner they see fit. To omit entirely the technology involved is an omission that results in this very question. If users in the field cannot get an answer from you, the expert in the field, then users are forced to rely on secondary sources. Primary sources are always preferred to secondary sources, whether the field be journalism, science, or any other human activity where a source for information is necessary to understand the full scope of the subject matter and event horizon (astrophysics).

Where primary sources are not available, conjecture and confusion, ignorance and franky, folklore, ensues - rather than actual primary source data. One example in the domain of history is the folklore of "Betsy Ross sewed the first American flag", which historically is inaccurate. When further researching the origin of the first U.S. national flag, it is inevitable that any researcher will encounter the Grand Union Flag or Continental Colors; from there it is inevitable that the researcher will find that the Grand Union Flag is, save for a diagonal stripe on the Union Jack, identical to the pre-existing East India Company (E.I.C.) Flag. However, there is no primary source explanation for why the stripes on the U.S. national flag - the same stripes that appears on the pre-existing East India Company flag - and the flag of Goes, a municipality in the Netherlands, were evidently copied from the pre-existing East India Company Flag. Heraldry is not willy-nilly. An entire book was written just on that topic, where the answer is still not clear - based on primary sources - as that decision was never explained, at least not in any primary source documents that I have been able to locate. The complete history of the Confederate States of America national flag is far more detailed than the history and origin of the U.S. national flag. We cannot ask the primary sources why they decided to use the identical design as the pre-existing East India Company flag. We can only follow the threads of actual historical evidence to theorize reasons. We find that the historical event the Boston Tea Party took place on an E.I.C. ship. The seed funding or articles of value for what would become Yale University was donated by Elihu Yale, a sacked president of the E.I.C., and other antecedent historical data re E.I.C. and the Colonies which would eventually form the U.S. But we still do not have primary source document unequivocally detailing why stripes on the U.S. national flag - the same stripes that appear on U.S. national flag today. We could attribute the stripes to being derived from Goes, but if we have not researched the original design of the Great Seal of the United States, then we would be at a deficit, because on one side of the Seal are symbols representing six European nations, including a Belgic Lion, hence a potential reference to the stripes on the flag of Goes. The E.I.C. was doing business in the Colonies, so that gives us a direct connection. Still, we have no primary source stating exactly why the stripes on the U.S. national flag is not an original design.

Here, we have the opportunity to avoid ambiguity. Am asking the Web Audio API authors - the primary source in this domain - to describe the difference between planar and interleaved audio data structure. The term "planar" is in the specification, the term "interleaved" is not - yet clearly "interleaved" is technically relevant to the implementation of the specification, and thus cannot be non-observable. When primary sources are available, that source must be relied on, instead of third-party information, which amounts to hearsay.

rtoy commented 4 years ago

The spec is a document for those "skilled in the art" as the saying goes. It is specifically not a tutorial. MDN is more a teaching/tutorial resource.

We can add a definition of planar to the spec. I'm not opposed to that. I do not see any reason to define interleaved.

AFAIK, no browser uses interleaved audio internally, except maybe when getting decoded audio. (Can't remember how ffmpeg works here.)

rtoy commented 4 years ago

Hmm. I searched the spec for the word "planar". I can't find it. Can you point out where you saw this in the spec?

guest271314 commented 4 years ago

Get the "skilled in the art" part. Am not asking for a tutorial. Am asking for the primary source document to not omit critical antecedent information. That is why specifications include a bibiliography.

You are correct re "planar" not being in the specification. What am stating is that neither "planar" nor "interleaved" are described with a modicum of detail.

What we have is

1.4. The AudioBuffer Interface This interface represents a memory-resident audio asset. Its format is non-interleaved 32-bit floating-point linear PCM values with a normal range of [−1,1], but values are not limited to this range. It can contain one or more channels. Typically, it would be expected that the length of the PCM data would be fairly short (usually somewhat less than a minute). For longer sounds, such as music soundtracks, streaming should be used with the audio element and MediaElementAudioSourceNode.

guest271314 commented 4 years ago

Here we have the term "non-interleaved" yet "interleaved" is not defined. "non-interleaved" leaves open any option that is not "interleaved", yet "interleaved" istself is not defined.

guest271314 commented 4 years ago

If this

AFAIK, no browser uses interleaved audio internally, except maybe when getting decoded audio. (Can't remember how ffmpeg works here.)

is true, then both "interleaved" and "planar" should be referenced by primary source citations to the controlling definition of the terms, as-applied, in the specification.

As it stands even one "skilled in the art" cannot rely on the specification for even non-normative references to exactly what is meant by "non-interleaved", and certainly not what "planar" means, yet browsers are using "planar" to implement the specification - or maybe they are not?

rtoy commented 4 years ago

OK. I'm not opposed to clarifying non-interleaved. But I think those "skilled in the art" know what that means.

guest271314 commented 4 years ago

OK. I'm not opposed to clarifying non-interleaved. But I think those "skilled in the art" know what that means.

How can you verify that assessment without a controlling definition of the term in the specification, without any room for another "skilled in the art" to disagree? Is there only a single definition for "non-interleaved" in this field? If there is only one possible definition of interpretation of that term, then there should not be any issue including that singluar, controlling definition in the specification. At a bare minimum a non-normative Note specifically citing the definitions relied on in the specification will avoid potential for ambiguity. If you know exactly what the term means, then print that in the specification. From a historical, scientific, and research perspective, even if the document is for individuals or institutions "skilled in the art", a table of Definitions to refer to is always advantageous.

rtoy commented 4 years ago

As mentioned above, I'm not opposed to clarifying "non-interleaved". A glossary can be added. But it's really hard to know what to add there. Something obvious to you may not be to me so should it be added? Hard to say.

guest271314 commented 4 years ago

It is hazardous to merely assume that any individual or institution that claims to be or presumed to be "skilled in the art" to infer meaning of a term where none is clearly defined.

For a far more egregious example, in law and history, is "race" theory. Individuals and institutions, particularly in the U.S. promulgate "race" theory, and in general, individuals and institutions continue to promulgate the folklore that so-called "black" "race" or "white" "race" exists, yet of the thousands of individuals that have asked the very basic question: "What primary source definition of the term 'black' 'race' or 'black' 'people' are you relying on?" and : "What primary source definition of the term 'white' 'race' or 'white' 'people' are you relying on?" not a single individual or institution has answered the question referring to the controlling administrative definition of "race", "black" and "white" in the U.S. And of, course, "black" "race" and "white" "race" do not officially exist in either France or Germany, so "skilled in the art" means different things depending on environment and the determination of the individual to get to the truth, instead of relying on mere conjecture; as there is no such thing as "black racial groups of Africa" or "Middle East" or "North Africa" The theory of “Black” “race” in the United States: “black racial groups of Africa” do not exist and The theory of “White” “race” in the United States: “North Africa” and “Middle East” do not exist., thus it is very easy, for intellectually lazy individuals to simply rely on what the American Association of Anthropologists deem "European folk taxonomy", instead of doing the work to get to the source of what is a grand fraud.

As mentioned above, I'm not opposed to clarifying "non-interleaved". A glossary can be added. But it's really hard to know what to add there. Something obvious to you may not be to me so should it be added? Hard to say.

Well, that is precisely what this issue is about. Why is it "really hard" to include a definition of a term that you just today claimed

But I think the API is clear: planar for AudioBuffer, AudioWorklet, ScirptProcessor. We don't need to describe whether planar or interleaved or a mix is used anywhere else.

OK. I'm not opposed to clarifying non-interleaved. But I think those "skilled in the art" know what that means.

The term must not be clear, and if you find isolating a definition for that term "really hard" then solve the problem by determining the precise primary source definition that the specification is going to rely on. For "interleaved", "non-interleaved" and "planar".

rtoy commented 4 years ago

We don't use planar or interleaved anywhere. No need to define these. We could just delete the one use of "non-interleaved" for AudioBuffer because that's an implementation detail. The API implies that to be efficient you should do certain things.

In fact, internally, the buffer doesn't have to be 32-bit floats either. Firefox can use 16-bit integers in certain circumstances, but you can't tell from the audio output or the API that this is done.

guest271314 commented 4 years ago

"implies" is essentially the same as inferring that those "skilled in the art" all agree on the same definition, which is an unverified theory. The scientific method requires reproduction of a theory, preferably by someone other than the claimaint, to verify the theory.

At one point you mentioned impact on memory and CPU, which is observable. Perhaps now you have qualms about stating that, which necessarily means the impact on audio output is observable, due to the load on memory and CPU. If audio output is not affected, yet other processes are affected, that is still an observation.

The issue is not about what an implementation does internally, the issue is about defining the technical terms corresponding to an actual technical implementation of the specification. Since "planar" or "interleaved" or "non-interleaved" are the apparent options for implementation, those terms should be defined for scope or possible implementations.

"non-interleaved" necessarily requires a definition of "interleaved".

Since you are "skilled in the art" and still find defining "non-interleaved" "really hard" the term should not be in the specification if you do not want to clearly define the term.

It would be beneficial to include a glossary of the terms used. Even among those "skilled in the art", in any discipline, there could still be disagreement as to what terms mean.

Either define the term "non-interleaved" or remove the term from the specification, to avoid conjecture.

rtoy commented 4 years ago

You have to assume some basic level for "skilled in the art". If you don't, you end up having to define everything. And to be facetious, this includes defining "audio" or even "the". My basic level is that "non-interleaved" is understood as basic knowledge.

As for observable memory and CPU, that's outside the scope of the specification. The spec says you have this node and when given this set of inputs and parameters you get some output. How you get that is up to you. You can be wasteful of memory and CPU if you want. Or be clever. This is all outside the scope.

guest271314 commented 4 years ago

Yes, defining every is work. It provides certainty. This issue is specific to the term "non-interleaved" which you already stated is "really hard" to define. That is not reliance on "basic knowledge" for an individual "skilled in the art", thus you inserted the pre-condition "basic level" even within the scope of "skilled in the art". Since "non-interleaved" is in the current iteration of the specification, and as yet has not been clearly defined, am asking that specific term be defined in the specification. A term cannot be "basic level" and "really hard" to define for one "skilled in the art" at the same time. That defies logic.

rtoy commented 4 years ago

I did not say "non-interleaved" is hard to define. It's not. It's hard to know what to put in a glossary.

At this point, I was just want to remove it from the AudioBuffer section and maybe just say the it nominally contains linear PCM samples. Nothing about interleaving, floating point or even the range.

guest271314 commented 4 years ago

I did not say "non-interleaved" is hard to define. It's not. It's hard to know what to put in a glossary.

The definition you are relying on.

Searching the internet for clarity leads to from individuals essentially quoting MDN and other individuals pointing out the language needs clarification.

Consider a sampling,

Interleaved / Non-Interleaved Decoded Audio #59 https://github.com/WICG/web-codecs/issues/59

The web platform only use planar buffers for audio, but that's probably because there was no interaction with codecs or IO, where interleaved audio is often preferred.

A interleaving/deinterleaving routine is probably very very fast in wasm, but I don't know how fast.

What is the difference between AV_SAMPLE_FMT_S16P and AV_SAMPLE_FMT_S16? https://stackoverflow.com/questions/18888986/what-is-the-difference-between-av-sample-fmt-s16p-and-av-sample-fmt-s16

AV_SAMPLE_FMT_S16P is planar signed 16 bit audio, i.e. 2 bytes for each sample which is same for AV_SAMPLE_FMT_S16.

The only difference is in AV_SAMPLE_FMT_S16 samples of each channel are interleaved i.e. if you have two channel audio then the samples buffer will look like

c1 c2 c1 c2 c1 c2 c1 c2...

where c1 is a sample for channel1 and c2 is sample for channel2.

while for one frame of planar audio you will have something like

c1 c1 c1 c1 .... c2 c2 c2 c2 ..

now how is it stored in AVFrame:

for planar audio: data[i] will contain the data of channel i (assuming channel 0 is first channel).

however if you have more channels than 8, then data for rest of the channels can be found in extended_data attribute of AVFrame.

for non-planar audio data[0] will contain the data for all channels in an interleaved manner.

followed by comment

I assume c1 c1 c2 c2 must refer to the bytes in the buffer, not the samples. Should either change it to c1 c2 c1 c2 for samples, or update the text to say bytes. – DuBistKomisch Sep 20 '16 at 10:55

Planar/interleaved option? #3 https://github.com/raymond-h/pcm-format/issues/3

When it does matter, pretty much all other modules use interleaved format, because that works nicely with streams, since you typically don't know when exactly a stream will end.

What's the interleaved audio ? [closed] https://stackoverflow.com/questions/17879933/whats-the-interleaved-audio

Generally speaking, if you have 2 channels, let's call them L for left and R for right, and you want to transmit or store 20 samples, then:

Interleaved = LRLRLRLRLRLRLRLRLRLR

Non-Interleaved = LLLLLLLLLLRRRRRRRRR

comment following

good answer, although non-interleaved generally means that you would actually have two buffers for your example, one containing only left samples, and one containing only right samples. – Mark Heath Jul 26 '13 at 15:53

Re

At this point, I was just want to remove it from the AudioBuffer section and maybe just say the it nominally contains linear PCM samples. Nothing about interleaving, floating point or even the range.

That is one option. Doing nothing is not a viable option; at least since the problem is recognized then would be malefeasance at this point. Include the definition or remove the term, as suggested above.

One way to view this issue is an oppurtunity for the Web Audio API to actually be an authority on web platform audio, since this is precisely the subject matter of the specification.

rtoy commented 4 years ago

Teleconf: Remove "non-interleaved".

SDRDesk commented 10 months ago

Looks like you have done the exact opposite of what the user wanted! Way to go ...