WebAudio / web-audio-api

The Web Audio API v1.0, developed by the W3C Audio WG
https://webaudio.github.io/web-audio-api/
Other
1.05k stars 167 forks source link

Need to provide hints to increase buffering for power consumption reasons #348

Closed cwilso closed 8 years ago

cwilso commented 10 years ago

It's been requested that we enable a batching mechanism to let users of the API tell us to process a batch of frames at a time (rather than the lowest latency we can support on a given system), trading off higher latency for applications that would rather minimize power (e.g. a simple background-music-playback app).

padenot commented 10 years ago

I've also had people requesting something like this to me as well.

I was thinking that it could be part of some some of Audio Channel API [0], but that's still not specced properly (although I remember talking to Google people about it, and Microsoft has something similar).

In general, I'd prefer to spec this in terms of use case or developer intent than explicitly latency.

[0]http://dxr.mozilla.org/mozilla-central/source/dom/interfaces /html/nsIDOMHTMLMediaElement.idl#98

jernoble commented 10 years ago

Why is this not up to the UA? For example, the WebAudio API on iOS will already switch to a larger buffer size when the system goes into a low-power mode.

padenot:

In general, I'd prefer to spec this in terms of use case or developer intent than explicitly latency.

I agree, and this could tie into other API feature requests as well. An author could mark an AudioSession as "media playback", which would both trigger a lower buffer size, and would interrupt other media playback on the device.

cwilso commented 10 years ago

Why is this not up to the UA?

It should be, ultimately, and I wouldn't want to take away the ability of a system to up the buffer size because it decides to go into a low-power mode.

However, apps should be specifically able to say "I'm cool with having a larger buffer size". I don't think this is just media playback, per se - it's any less-interactive audio scenario. It's finer grained (or perhaps just orthogonal) to the audio channel specification types.

On Mon, Aug 25, 2014 at 10:10 AM, Jer Noble notifications@github.com wrote:

Why is this not up to the UA? For example, the WebAudio API on iOS will already switch to a larger buffer size when the system goes into a low-power mode.

padenot:

In general, I'd prefer to spec this in terms of use case or developer intent than explicitly latency.

I agree, and this could tie into other API feature requests as well. An author could mark an AudioSession as "media playback", which would both trigger a lower buffer size, and would interrupt other media playback on the device.

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-53293240 .

jernoble commented 10 years ago

I didn't see a specific API proposal here, so here's a strawman:

enum AudioContextPlaybackCategory { "interactive", "non-interactive", "media"};

partial interface AudioContext {
    // defaults to "interactive"
    attribute AudioContextPlaybackCategory playbackCategory;
}

A UA could decide to set the hardware buffer size to 128 for "interactive" playback, to 4096 for "non-interactive", and to 44,100 for "media" playback.

padenot commented 10 years ago

In Gecko, we put that in the AudioContext ctor to make it clear that it can't be changed.

On Thu, Aug 28, 2014, at 07:09 PM, Jer Noble wrote:

I didn't see a specific API proposal here, so here's a strawman: enum AudioContextPlaybackCategory { "interactive", "non-interactive", "m edia"};

partial interface AudioContext { // defaults to "interactive" attribute AudioContextPlaybackCategory playbackCategory; }

A UA could decide to set the hardware buffer size to 128 for "interactive" playback, to 4096 for "non-interactive", and to 44,100 for "media" playback.

— Reply to this email directly or [1]view it on GitHub.

References

  1. https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-53757682
jernoble commented 10 years ago

padenot:

In Gecko, we put that in the AudioContext ctor to make it clear that it can't be changed.

Is there some reason why it can't be changed? Maybe that's reasonable, but for iOS and OS X it's certainly possible to change the hardware buffer size dynamically.

padenot commented 10 years ago

Sorry, I was unclear.

We implemented a whatwg proposal where you give an audio channel (roughly corresponding to use cases, see the link to the interface in the second comment) to a context, or media element, etc, and this gets decided on construction. It is possible to change the latency while everything is running, but that's not exposed yet to the web (and in fact, AudioChannel stuff is only available to system apps, iirc, it's not really exposed to the web).

cwilso commented 10 years ago

ROC: Sorry, the specific proposal I'd made was here: http://cwilso.github.io/web-audio-api/proposals.html#widl-AudioContext-AudioContext-AudioContext-float-buffering.

In short, the constructor for AudioContext becomes: AudioContext AudioContext (optional float buffering = 0);

A user may choose to ask the system to process blocks of audio in batches, in order to reduce power consumption (while increasing latency). This parameter is specified as the number of seconds of desired buffering latency. This is a hint to the system about the minimum requested buffering; underlying constraints may forcer larger buffers (higher latency) than requested for system reasons. This buffer request will also be increased to the nearest larger duration that equals an integral number of 128-sample processing blocks.

jernoble commented 10 years ago

cwilso:

AudioContext AudioContext (optional float buffering = 0);

There are a couple of problems with this proposal:

1) It is extreme overkill for the use case in question. Instead of indicating to the UA that the author would like the UA to prioritize power consumption over latency, this API requires the author to pick a specific buffer size in order to do so. The control (preferred buffer size) is two levels removed from its intended goal (power consumption, by way of increasing CPU idle periods, presumably). And setting the buffer size to a high value could conceivably be counterproductive (to power consumption) on certain UAs.

2) It takes a very specific parameter, which is almost always ignored. The buffering parameter will be rounded to the nearest 128 / sampleRate. It will be ignored if the hardware minimum buffer size exceeds it, or basically whenever the UA decides to. The parameter feels like an inherently broken promise.

If the UA is going to generally ignore this parameter, and if the UA knows how to best tune the buffer size (among other things) to achieve low power consumption at the expense of latency, it would be better to have the hint be something both less precise and more descriptive: e.g., "low", "medium", and "high". (To be clear, I prefer the "interactive", "non-interactive", "media" proposal, but for reasons unrelated to power use.)

cwilso commented 10 years ago

1) I'd argue it's not overkill. It's not a binary "prioritize power over latency", it's a "here's the level of latency I'm happy with."

2) Happy to change the semantic there - number of blocks? And of course, the hardware can up it - the same has always been true of buffersize in ScriptProcessor.

The UA doesn't know best how to tune the buffer size without being told what level of latency a particular app is comfortable with. I think this is beyond low/medium/high. "Interactive" is different for a live audio mixer vs a DJ scratching, e.g.

jernoble commented 10 years ago

@cwilso

1) I'd argue it's not overkill. It's not a binary "prioritize power over latency", it's a "here's the level of latency I'm happy with."

But that’s not the use case you’re trying to solve. You specifically brought up “power consumption” as the driving use case for this change, which is not what this API actually fixes. If you want a generic latency control mechanism, this use case doesn't justify it.

cwilso commented 10 years ago

Sorry, I should have expanded. "I want to minimize power. Here's a level of latency I'm willing to trade for it."

On Thu, Sep 4, 2014 at 11:11 AM, Jer Noble notifications@github.com wrote:

@cwilso https://github.com/cwilso

1) I'd argue it's not overkill. It's not a binary "prioritize power over latency", it's a "here's the level of latency I'm happy with."

But that’s not the use case you’re trying to solve. You specifically brought up “power consumption” as the driving use case for this change, which is not what this API actually fixes. If you want a generic latency control mechanism, this use case doesn't justify it.

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-54520293 .

jernoble commented 10 years ago

Even so, the script is not equipped with enough information to make an informed decision. This API seems to suggest that "moar latency == moar efficiency", when that relationship is definitely not linear and may not even be true.

Here's what I bet would happen if this API was standardized: your average WebAudio using page author would tune the latency value for his favorite browser and device, regardless of what affect that value has on the performance of other browsers and other devices.

Instead, with a more declarative API, each UA could pick a latency value which fits the local maxima for performance while meeting the general requirements for the selected "class" of playback.

cwilso commented 10 years ago

Yes, that's true. And the only problem is having a sufficiently descriptive set of classes of interaction. I suspect three classes doesn't quite capture it.

On Thu, Sep 4, 2014 at 2:30 PM, Jer Noble notifications@github.com wrote:

Even so, the script is not equipped with enough information to make an informed decision. This API seems to suggest that "moar latency == moar efficiency", when that relationship is definitely not linear and may not even be true.

Here's what I bet would happen if this API was standardized: your average WebAudio using page author would tune the latency value for his favorite browser and device, regardless of what affect that value has on the performance of other browsers and other devices.

Instead, with a more declarative API, each UA could pick a latency value which fits the local maxima for performance while meeting the general requirements for the selected "class" of playback.

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-54548276 .

padenot commented 10 years ago

That's part of the reason why we have more in our model.

On Fri, Sep 5, 2014, at 01:08 AM, Chris Wilson wrote:

Yes, that's true. And the only problem is having a sufficiently

descriptive set of classes of interaction. I suspect three classes doesn't

quite capture it.

On Thu, Sep 4, 2014 at 2:30 PM, Jer Noble notifications@github.com wrote:

Even so, the script is not equipped with enough information to make an

informed decision. This API seems to suggest that "moar latency == moar

efficiency", when that relationship is definitely not linear and may not

even be true.

Here's what I bet would happen if this API was standardized: your average

WebAudio using page author would tune the latency value for his favorite

browser and device, regardless of what affect that value has on the

performance of other browsers and other devices.

Instead, with a more declarative API, each UA could pick a latency value

which fits the local maxima for performance while meeting the general

requirements for the selected "class" of playback.

Reply to this email directly or view it on GitHub

https://github.com/WebAudio/web-audio-api/issues/348#issuecomm ent-54548276

.

— Reply to this email directly or [1]view it on GitHub.

References

  1. https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-54558255
jernoble commented 10 years ago

@cwilso

Yes, that's true. And the only problem is having a sufficiently descriptive set of classes of interaction. I suspect three classes doesn't quite capture it.

@padenot

That's part of the reason why we have more in our model.

For reference, this is the document I found listing the stream types to which @padenot is referring: [https://wiki.mozilla.org/WebAPI/AudioChannels]. They're roughly analogous to iOS AudioSession categories.

I think there's good reasons not to expose this exact set to the web (because no one wants in-page advertising to have access to the "alarm" channel), but a more exhaustive set of appropriate types seems like a good thing.

padenot commented 10 years ago

As Jer point out, normal web applications don't have access to all of those. This is used most of the time for system applications on Firefox OS (obviously there are exceptions: applications that are music players can output sound when in the background, for example, but iirc they have to be installed for this to work).

Paul.

On Fri, Sep 5, 2014, at 01:32 AM, Jer Noble wrote:

[1]@cwilso

Yes, that's true. And the only problem is having a sufficiently descriptive set of classes of interaction. I suspect three classes doesn't quite capture it.

[2]@padenot

That's part of the reason why we have more in our model.

For reference, this is the document I found listing the stream types to which [3]@padenot is referring: [[4]https://wiki.mozilla.org/WebAPI/AudioChannels]. They're roughly analogous to iOS [5]AudioSession categories.

I think there's good reasons not to expose this exact set to the web (because no one wants in-page advertising to have access to the "alarm" channel), but a more exhaustive set of appropriate types seems appropriate.

— Reply to this email directly or [6]view it on GitHub.

References

  1. https://github.com/cwilso
  2. https://github.com/padenot
  3. https://github.com/padenot
  4. https://wiki.mozilla.org/WebAPI/AudioChannels
  5. https://developer.apple.com/Library/ios/documentation/Audio/Conceptual/AudioSessionProgrammingGuide/AudioSessionBasics/AudioSessionBasics.html
  6. https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-54560090
cwilso commented 10 years ago

I'm concerned about conflating this with the other behaviors (pause while on a call, etc) but let's discuss.

On Thu, Sep 4, 2014 at 4:40 PM, Paul ADENOT notifications@github.com wrote:

As Jer point out, normal web applications don't have access to all of those. This is used most of the time for system applications on Firefox OS (obviously there are exceptions: applications that are music players can output sound when in the background, for example, but iirc they have to be installed for this to work).

Paul.

On Fri, Sep 5, 2014, at 01:32 AM, Jer Noble wrote:

[1]@cwilso

Yes, that's true. And the only problem is having a sufficiently descriptive set of classes of interaction. I suspect three classes doesn't quite capture it.

[2]@padenot

That's part of the reason why we have more in our model.

For reference, this is the document I found listing the stream types to which [3]@padenot is referring: [[4]https://wiki.mozilla.org/WebAPI/AudioChannels]. They're roughly analogous to iOS [5]AudioSession categories.

I think there's good reasons not to expose this exact set to the web (because no one wants in-page advertising to have access to the "alarm" channel), but a more exhaustive set of appropriate types seems appropriate.

— Reply to this email directly or [6]view it on GitHub.

References

  1. https://github.com/cwilso
  2. https://github.com/padenot
  3. https://github.com/padenot
  4. https://wiki.mozilla.org/WebAPI/AudioChannels 5. https://developer.apple.com/Library/ios/documentation/Audio/Conceptual/AudioSessionProgrammingGuide/AudioSessionBasics/AudioSessionBasics.html 6. https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-54560090

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-54560721 .

dalecurtis commented 10 years ago

Chrome platform audio developer here. I'm a fan of the AudioContextPlaybackCategory proposal. Supported buffer sizes are inherently different across platforms. OSX has a 128 minimum, Linux 512, Windows ~10ms and not all of these work well at the highest sample rates. The enumeration proposal allows us to do different things under the hood which work best on each platform.

Ideally one of these buffering types (or a new one) would essentially mean "pass through," which has the meaning that when connected to audio tag or WebRTC stream WebAudio would use the underlying buffer size that each component would use without WebAudio in place.

joeberkovitz commented 10 years ago

TPAC resolution: we should do something similar to Jer's comment of August 28, but place the AudioContextPlaybackCategory value inside a property-bag Object passed to the AudioContext constructor conveying a set of options. The current playback category is accessible via a readonly attribute of the context.

joeberkovitz commented 10 years ago

@cwilso please propose specific categories and semantics.

bjornm commented 9 years ago

Hello! Just stumbled across this item, it looks really good.

I'm just wondering what the status is? The latency in Chrome has gone up for Mac from 25ms to 35ms, see this commit:

https://chromium.googlesource.com/chromium/src/+/ca469461bbd554ecddc82ea91d2cef7f3df5502e

It seems, it is awaiting a spec proposal before a fix to AudioContext and thereby the underlying audio pipeline can be added. So if you add this to the spec, then maybe Chrome dev can implement that change and revert the latency to a lower figure once more. It would be awesome.

Thanks, Björn

joeberkovitz commented 9 years ago

Note that anything we do with the AudioContext constructor should be considered in light of the need to obtain AudioContexts for specific output devices in the future (e.g. a device ID might need to be passed in).

joeberkovitz commented 9 years ago

TPAC 2015:

AudioContext will take a new property bag argument allowing specification of this category as an option.

enum AudioContextPlaybackCategory {"balanced",  "interactive", "playback"};

"balanced": balance latency and stability/power consumption "interactive": lowest latency possible without glitching "playback": latency not important, sustained playback without interruption is priority. Lowest power consumption.

The default category is "interactive".

NOTE: the AudioContext property-bag argument will now absorb other proposed initialization data for AudioContext (e.g. output device).

billhofmann commented 8 years ago

Folks: working on this, what I'm thinking is the interface looks something like this:

[Constructor(optional AudioContextInit contextInit)] interface AudioContext : EventTarget { ...

where AudioContextInit is a dictionary that has the buffering hint, but can also have the init id.

Whaddya think?

billhofmann commented 8 years ago

ok. my pull has that approach.

rtoy commented 8 years ago

I think I would prefer the name AudioContextOption(s) instead of AudioContextInit.

hoch commented 8 years ago

I also like AudioContextOptions slightly better - I've been thinking something like this. (Note that sinkId and sampleRate are just shown as an example.)

dictionary AudioContextOptions {
  DOMString? sinkId;  
  AudioContextPlaybackCategory? playbackCategory = 'interactive';
  long? sampleRate;
}
billhofmann commented 8 years ago

Options works fine for me. For now, unless someone objects, I'm going to leave out sinkID and sampleRate. From a WebIDL PoV, does saying = 'interactive' flag the default value?

hoch commented 8 years ago

I believe so - I simply followed W3C WebIDL spec. Also I absolutely agree with you. I put sinkId and sampleRate there just as an example.

foolip commented 8 years ago

There's an Intent to Implement and Ship: WebAudio: Add buffering/latency hint via playbackCategory on blink-dev where I've questioned the proposed hint-based API. Let me summarize here.

Audio output APIs on native platforms are not restricted to a similar 3-value playbackCategory when deciding on the trade-offs around latency, so the enum must internally be mapped to some numeric latency for each device. This would make the playbackCategory API less powerful than native platforms, and this does not seem necessary.

If the API were to use a numerical requested latency, and the actual latency used will always be in the range of latencies allowed by the playbackCategory API, then this would be strictly more powerful, with no apparent downside. In addition, I would propose:

Please help me understand if there's any fundamental reason why this, or something similar, would not be workable, or if there's reason to think that the outcome in practice would be worse.

dalecurtis commented 8 years ago

Practically it's a lot of overhead for a value we'll likely ignore 90% of the time; which was my original objection to this. It's a pain trying to find a buffer size which avoids glitching -- especially in Chrome's case where we have to transfer the audio data across processes in response to an OS callback.

I don't want Chrome (or its users) to have to sign up for mucking about with buffer sizes in a multi-platform world. I don't even like doing it myself :) It seems far more practical to map a set of enums to a loosely-defined categories which can vary across platforms.

Largely my desired outcome for this API is better music playback and real time communication interactions. Today these are hampered by the less-resilient-to-glitching, higher-power buffer size always chosen by WebAudio. Ideally (in my eyes of course) instead WebAudio should be slaved to the buffer size required for these application.

For playback that's: 20ms rounded to a power-of-two on POSIX variants and a flat 20ms on Windows. For RTC it can vary depending on the microphone sample rate and platform.

I also don't want us to have to sign up for ensuring all buffer sizes in a range playout smoothly, because as mentioned its a tweaking game which has taken years to get right for the majority of users.

foolip commented 8 years ago

Thanks for elaborating, @dalecurtis!

Ideally (in my eyes of course) instead WebAudio should be slaved to the buffer size required for these application.

That makes a lot of sense, does the playbackCategory API achieve that? I assume that a single system can have multiple microphones with different sampling rates, and that the buffer size for RTC is therefore not a constant even on a single device. If there are multiple audio output devices with different sampling rates, it sounds like media elements would also not have a fixed buffer size.

In order to slave the AudioContext to the buffer size required by the input, then perhaps the API should be of the type "this is the MediaStream/HTMLMediaElement that I will use as the primary input, please pick a matching latency"? Something like new AudioContext({ optimizeForThis: mediaStream }) or new AudioContext({ optimizeForThese: [mediaElement, mediaStream] })?

I also don't want us to have to sign up for ensuring all buffer sizes in a range playout smoothly, because as mentioned its a tweaking game which has taken years to get right for the majority of users.

That wouldn't have to be the case even with a numeric latency, it could be restricted to only those values that have been tweaked to perfection. (If there are less than 3 perfectly tweaked latencies, then it'd be silly of course.)

rtoy commented 8 years ago

On Wed, Jan 27, 2016 at 7:47 PM, Philip Jägenstedt <notifications@github.com

wrote:

Thanks for elaborating, @dalecurtis https://github.com/dalecurtis!

I also don't want us to have to sign up for ensuring all buffer sizes in a range playout smoothly, because as mentioned its a tweaking game which has taken years to get right for the majority of users.

That wouldn't have to be the case even with a numeric latency, it could be restricted to only those values that have been tweaked to perfection. (If there are less than 3 perfectly tweaked latencies, then it'd be silly of course.)

​Can't speak for other items, but for webaudio only, any latency above the minimum we support today should work fine for audio output. Once you throw in audio input, media streams, rtc, etc., it is probably quite messy.

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-175948191 .

Ray

dalecurtis commented 8 years ago

The category API does resolve this, but not as well as your optimizeForThis idea; which sounds very appealing. It prevents users from shooting themselves in the foot by say requesting real time latency with basic playback with the category based API.

I believe tommi@ is lurking on this thread, so he can chime in if I'm wrong, but WebRTC will (soon) mix and resample as necessary to facilitate a single output stream (required for echo cancellation) - so this shouldn't be a problem in the future (if it is even now).

As far as the numeric value approach I don't see the value of presenting a false choice. I.e., it seems only valuable if we allow users to choose any buffer size (in reason). Based on bug reports from pro audio users, I'd wager most (the majority?) of users who would want to tweak their sizes want a lower value than we allow today (i.e. unrestricted numeric) and wouldn't care much for the optimizeForThis or category APIs.

tomasgunn commented 8 years ago

While I've taken a look at this thread occasionally, I'm not fully up to speed, so apologies if I have no clue what I'm talking about :)

So, yes, we're working on mixing webrtc audio in Chrome along with other audio, resample as appropriate, group by buffer sizes etc. We believe there are problems, especially on Mac, with how we do audio IO and one of our goals is to reduce IPC and the chance of running into glitches.

My .02$ on categories and buffer sizes:

As a recording musician, I would like to be able to tweak buffer sizes directly. That's what I do in Cubase, Pro Tools, Tracktion etc. If Chrome is to be more than a toy in this regard, I would expect the same sort of power to be available. I'm also a believer in not taking power away from app developers and do their thinking for them. If that means more footguns will be available, that's acceptable :)

For general playback of audio/video and configuring WebRTC sessions, I think it makes sense to leave it up to the browser do decide and be able to have categories such as 'rtc', something for non-interactive playback, something for low power etc.

foolip commented 8 years ago

OK, so from @rtoy in https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-176264372 it sounds like for Web Audio that doesn't interact with HTMLMediaElement or MediaStream, a large range of latencies would be possible.

And for HTMLMediaElement or MediaStream, it's not quite clear to me if using the playbackCategory enum would allow for the "WebAudio should be slaved to the buffer size required" that @dalecurtis wishes for in https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-175780999, or if it would be necessary to provide the source at context creation time as in https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-175948191.

@tomasgunn, I agree that the web platform should match the powers of native platforms to the fullest extent possible, and the challenge here is to understand to which extent it is possible.

@rtoy, this is blocking the blink-dev intent, do you feel like you have any way of moving this forward, or does everything suggested so far seem impractical?

mdjp commented 8 years ago

Reopened after call - needs further discussion

rtoy commented 8 years ago

@foolip Sorry for the delay. It looks like we'll need to discuss this further. I sympathize with @tomasgunn's desires for fine control, but I also defer to @dalecurtis expertise in making chrome audio glitch free. Chrome's webaudio implementation depends ultimately on the work that @dalecurtis and others have done.

tomasgunn commented 8 years ago

I'm also a part of that work :-)

On Thu, Feb 18, 2016, 23:35 rtoy notifications@github.com wrote:

@foolip https://github.com/foolip Sorry for the delay. It looks like we'll need to discuss this further. I sympathize with @tomasgunn https://github.com/tomasgunn's desires for fine control, but I also defer to @dalecurtis https://github.com/dalecurtis expertise in making chrome audio glitch free. Chrome's webaudio implementation depends ultimately on the work that @dalecurtis https://github.com/dalecurtis and others have done.

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-185957525 .

joeberkovitz commented 8 years ago

@rtoy to develop proposal

rtoy commented 8 years ago

Possibly allowing both the category string and a specific number.

rtoy commented 8 years ago

How about this:

dictionary AudioContextOptions {
   (AudioContextPlaybackCategory or double) playbackCategory = "interactive";
};

When playbackCategory (do we need a new name?) is double, it specifies desired buffer size in sec. This is a hint. The browser should try to accommodate the request as best as possible, but no guarantees on whether it honored your request, or even if it sounds ok.

Do we also need an attribute to specify the actual size that the browser used? That might be useful to adjust the graph in some appropriate way?

Probably need some hints like a category of 0 is the same as "interactive". If category is > x sec, it's the same as "playback". (What is the value of x? 100 ms? 1 sec?) Don't know how to specify "balanced" as a number. I suspect we can't really.

So many issues to sort out....

foolip commented 8 years ago

Do we also need an attribute to specify the actual size that the browser used? That might be useful to adjust the graph in some appropriate way?

I'm not sure how to use it, but that sounds sensible, not too dissimilar from OfflineAudioContext.prototype.length.

Don't know how to specify "balanced" as a number. I suspect we can't really.

I suppose the spec doesn't need to say what the number is for any of the categories, only that interactive ≤ balanced ≤ playback. But if "balanced" is primarily intended for WebRTC, wouldn't it internally simply be a delay that is imperceptible for voice (~50ms?) and then just clamped to what the platform can handle? If so, then presumably the web app could just handle this by asking for 50ms directly.

Then there is of course also the question about syncing clocks, if the new AudioContext({ optimizeForThis: mediaStream }) idea makes any sense or not.

rtoy commented 8 years ago

On Wed, Mar 2, 2016 at 9:12 AM, Philip Jägenstedt notifications@github.com wrote:

Do we also need an attribute to specify the actual size that the browser used? That might be useful to adjust the graph in some appropriate way?

I'm not sure how to use it, but that sounds sensible, not too dissimilar from OfflineAudioContext.prototype.length.

​I'm kind of guessing on its usefulness, but I like introspection if what I ask for might not have actually happened.​

​I'm mostly thinking of cases like requesting 1 microsec (silly, yes) which isn't supported. It might be useful to know that instead of 1 microsec, 3 ms is used.

Or maybe the browser will round the time to be a multiple of 128 frames to reduce CPU peak processing requirements. That might be nice to know too.​

Don't know how to specify "balanced" as a number. I suspect we can't really.

I suppose the spec doesn't need to say what the number is for any of the categories, only that interactive ≤ balanced ≤ playback. But if "balanced" is primarily intended for WebRTC, wouldn't it internally simply be a delay that is imperceptible for voice (~50ms?) and then just clamped to what the platform can handle? If so, then presumably the web app could just handle this by asking for 50ms directly.

​I think if the user said 50 ms, we give 50 ms if possible. If you want to interact with WebRTC, use "balanced". Anything else would probably be wrong unless you looked at the actual implementation of the browser.​

Then there is of course also the question about syncing clocks, if the new AudioContext({ optimizeForThis: mediaStream }) idea makes any sense or not.

​A different but closely related issue. I don't know what the best approach would be.​

— Reply to this email directly or view it on GitHub https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-191329665 .

Ray

rtoy commented 8 years ago

As discussed in the last teleconference, here is a proposal.

dictionary AudioContextOptions {
   (AudioContextLatencyCategory or double) latencyHint = "interactive";
};

AudioContextLatencyCategory is just a new name for AudioContextPlaybackCategory. The category is the preferred method for specifying the latency. For fine control, latencyHint can be a number specifying the latency in seconds. The browser will adjust its internal processing to satisfy the hint as best as possible.

The AudioContext adds a new attribute

readonly double processingLatency;

The value of processingLatency is the actual value (in sec) used by the browser.

For example, at 44.1 kHz, if a browser implements double buffering for the internal audio processing and can process audio at 128 frames, processingLatency would be 2*128/44100 = 5.804e-3.

hoch commented 8 years ago

This looks good to me, but:

  1. Not sure (AudioContextLatencyCategory or double) as type is possible.
  2. We might want to clarify AudioContext.processingLatency merely represent the latency caused by Web Audio API, not the underlying hardware or the subsequent processing outside of the browser.
rtoy commented 8 years ago

I tried AudioContextLatencyCategory or double) is allowed in WebIDL and it seems to work in Chrome.

Yes, we need to mention that the latency here does does not include the rest of the audio system.

foolip commented 8 years ago

The latencyHint proposal makes sense to me. A union type like this is used in https://w3c.github.io/webvtt/#the-vttcue-interface so it does work.

The "pass through" semantics first mentioned in https://github.com/WebAudio/web-audio-api/issues/348#issuecomment-57837507 still seem like an unsolved problem, but it seems quite likely that an API for that would be (syntactically) orthogonal to latencyHint / processingLatency and even with that problem solved there would be purse Web Audio cases where latency control is useful.

rtoy commented 8 years ago

I think the category "balanced" is meant for the pass through case. There's a problem if the appropriate value for, say, an audio tag vs WebRTC is different, though.