Open juj opened 10 years ago
My opinion in this matter is pretty much in sync with KG...
The thing here is that we have a lot of different scenarios that have different needs. While for a game it might be OK to let the UA decide the quality / performance tradeoff, but for a DAW degrading the quality is a dealbreaker. I don't want us to encourage the UA to do this sort of stuff. The difference betweeen letting the developer decide and letting the UA decide is that especially in the case of mobile the developer can update the choice-making logic much faster, but the UA might be maintained for a longer period. The developer knows what the application does, and will do, whereas the UA will have to resort to heuristics that can easily go wrong (as I pointed out, something that's an optimization for one application is a bug for another). Hiding performance as implementation details is a terrible idea in my experience.
Another thing to consider is that both use cases I mentioned here (games and DAWs) traditionally don't actually do any heuristics, but let the users choose the performance options. This makes sense because ultimately the user is actually in the best position to make the decision because they can see the impact of it. If we let the UA decide this sort of thing, those applications lose the ability to let the user decide.
I think all in all, letting the UA decide is more or less a useless feature that would cause a lot of implementation complexity and people trying to work around it anyway.
As for the subject of compressed assets, with using <audio> there's the unresolved problem of time-syncing it with the rest of the API. One option would be to allow to create an AudioBuffer out of an <audio> element. This would of course throw if the asset hasn't finished loading yet.
I still think it is a good idea to let the UA decide what representation to use internally, but it should not degrade the quality of samples (unless there is some hint in the API that lets a developer tell the UA what quality level it can accept).
Here's a fairly non-intrusive way to support integer formats that I think would solve most problems (comments welcome):
Furthermore, I suggest that:
In general, I don't see how we could both have integer formats internally AND resample the data. To me it seems that we need to go to float32 when resampling, right?
I still think it is a good idea to let the UA decide what representation to use internally
What's the value in that? Because the cost is really high if you expect the UAs to actually deliver heuristics that help in most cases and don't at least hurt in others.
At most, the UA deciding to me would be better as a NTH feature that is either opt-in or opt-out, because in the end it's going to be the user who has the access to the most relevant information, so I'm against doing anything that prevents the application from providing the user with the choice in this. Don't get me wrong, I'm all for reasonable defaults, but in matters where the performance impact is this high, the defaults should be possible to overwrite.
I think that actually the hinting should work the other way around; the UA could give hints to the application on what's the best thing to do, then the application can use that as a default unless the user overrides it, or if there is a known limitation with a given device, etc.
At the moment implementations can choose (without normative spec changes) to store decoded buffers in some compressed format that is expanded to floats when read or played back. That might or might not perform well relative to other implementations that don't do this. The playback quality relative to other decoding treatments might vary. It's not for the spec to say.
That said, we plan to revisit in-memory compression more thoroughly in the next version of the spec.
I wonder if there has been any thoughts or communication about this recently? I'm currently getting bitten by this again when porting a game from Android with Emscripten to run in web browser in a mobile phone, and facing considerable memory pressure trying to get it to run. Being able to store audio as original 16-bit instead of expanding to 32-bit would save around 50MB of RAM at runtime for the application, which would be a huge saving when trying to run on phones with 256MB/512MB of RAM.
We have not been talking about this recently, but we understand it is still an issue.
Joe, did you mean to push that back to v.next on June 3rd ?
I did mean to push it back, because I thought the sense of the group was that implementations could store audio any way they want internally inside an AudioBuffer even if the externally visible data is represented as floats.
That doesn't mean I'm dismissing it as an issue, I understand it's a big deal, but we hadn't agreed on a straightforward solution in the spec and there seems to be some room for implementations to make this better without changing the spec.
By the way, I wonder if the AudioContextOptions
being floated in WebAudio/web-audio-api#348 would be useful to request a lower AudioBuffer bit depth... if this were to be an opt-in.
If an option lived in AudioContextOptions
, that would force all AudioBuffers to have the same bitness? It sounds odd to require an application to have all its buffers in the same bitness, shouldn't it be a property of the buffer rather than the context?
Has there been any recent advances on this, or thoughts on if/when support for 16bit audio buffers might be realistically introduced? We've been working with a major game company partner on a HTML5 title to be deployed on Facebook, and out of memory crashes contribute more than 30% of the initial conducted QA tests. Profiling shows that having support for 16-bit audio would allow optimizing the game to use 10-20% less memory, which would definitely help with the OOM crashes. Games often utilize a lot of different sound effects, and they are preloaded up front since they need to be played back in real time as a response to a game logic event, so they typically have large banks of audio stored in memory. The native version of the game utilizes 16-bit audio buffers, so needing to expand them to 32-bit on the web causes a big discrepancy in native app vs HTML5 app memory footprints.
Hi, I have worked on porting a mobile game to WebGL, which you can check out at www.topeleven.com. In our case using 16-bit audio would decrease memory usage about 10%. So this is an optimization worth considering.
This is still showing up in most Unreal Engine 4 and Unity3D ported titles on the web, and being able to use 16-bit integer formats for audio effects would be a big size saving for these demos. I wonder what the latest thinking is on this? This bug was added a "V1" label earlier, but that was then removed by @mdjp . What does that mean, and what does its removal mean? Has there been any thought for adding this feature in the future? Thanks!
Based on skimming over the issue and the labels set here, this will not be in v1, but in the next version.
I think there are a couple issues that need to be worked out. First, what does decodeAudioData
do? And how do we specify what it should do? Second, how does AudioBuffer
indicate the format and how does the user specify it?
I think specifying a new AudioBuffer
feature would be relatively simple with the new AudioBufferOptions
. Some work needed to specify how it behaves. Presumably, it gets converted to float internally when used.
decodeAudioData
is a bit of a mess and it's really hard to add something here while keeping everything backward-compatible.
I have created a test suite of different audio files and effects that currently are problematic. You can visit https://github.com/juj/audio_test_suite to find it, or http://clb.demon.fi/audio_test_suite/ to check it out live.
First, what does decodeAudioData do?
While creating the above set of tests, I notice decodeAudioData
currently is hardcoded to decode to the sampling rate of the AudioContext
, at least on Firefox. I wish that was not the case, but decodeAudioData
would not perform resampling conversion on the input. Similarly, it would be best if decodeAudioData
did not do any sample type conversion on the input either. Though overall, I'd vote for removing decodeAudioData
altogether, WebAudio/web-audio-api#1305, and replacing it with better APIs that make users memory aware, because the usage of that function in the wild is heartbreakingly lax at the moment. This probably jives well with you mentioning decodeAudioData
being a mess.
Overall, I'd like to see the manipulation of compressed and uncompressed audio be much more symmetric in the API, so that all features are available on both formats. At that point, different uncompressed formats would probably also become easier to express. Though I only know of a small subsection of Web Audio API overall, so not sure how easy or hard that would be to achieve.
I would imagine that playback is the 90% case, with an extremely shallow effect graph. For something like ConvolutionEffectNode, mandating float inputs/outputs is fine. For AudioSourceBufferNode (and perhaps ScriptProcessorNode), I'd really like to see 16-bit-depth support here.
It would also be nice to have better control over the samplerate - the fact that decodeaudiodata always downsamples to the output rate is unfortunate, as it's lossy.
Although perhaps not the API you'd choose, control over the sample rate is available through OfflineAudioContext.
Hmm, interesting, since AudioBuffers can (IIRC) be shared across contexts...
@hoch had opened a discussion about AudioDeviceClient API, which led to a conversation about efficient compressed audio sample playback. That prompted an illustration/sketch of an API to play back compressed audio clips, something like follows:
var audioFeatures = AudioDevice.enumerateAudioSupport(); // Returns a list of e.g. {sampleRate: 44100, channels: 'stereo' }, {sampleRate: 48000, channels: '5.1' }
var device = new AudioDevice({sampleRate: 44100, channels: 'stereo' });
// Compressed audio playback:
var mediaSource = new MediaSource(myTypedArray, /*offset*/43242, /*length*/5325, 'audio/ogg'); // weak reference to typed array bits, no deep copy of byte data
// or mediaSource = new MediaSource(fetch('foo.ogg'));
mediaSource.downloadHint = 'download on first play'/'download up front'/'decode up front';
mediaSource.onloaded / .readystate etc. to provide information
var mediaInstance = new MediaInstance(mediaSource);
mediaInstance.start = 0;
mediaInstance.loopStart = 2342;
mediaInstance.loopEnd = 53114;
mediaInstance.loopTimes = 3; // default=infinity
mediaInstance.end = 350000;
mediaInstance.pitch/.volume/.worldPosition = ...;
mediaInstance.onloopended/.onended = function() {};
var playbackGroup = device.createAudioPlaybackGroup();
var playbackInstance1 = playbackGroup.play(mediaInstance, timeFromNow=0);
var playbackInstance2 = playbackGroup.play(mediaInstance, timeFromNow=2);
var playbackInstance3 = playbackGroup.play(mediaInstance, timeFromNow=4);
playbackInstance2.pitch = ...; // animate the playback pitch
playbackGroup.volume/.pitch = ...; // animate clips in a group
playbackGroup.stop(); // stops all audio files playing in this group
// soft real time push mode synthesis:
var playbackGroup = device.createAudioPlaybackGroup();
var mediaInstance1 = new MediaInstance(myTypedArray, /*offset=*/2000, /*length=*/400000);
myTypedArray[2000 through 402000] = /*synthesized audio frames*/;
playbackGroup.appendQueue(mediaInstance1);
var mediaInstance2 = new MediaInstance(myTypedArray, /*offset=*/402000, /*length=*/400000);
myTypedArray[402000 through 802000] = /*more synthesized audio frames*/;
playbackGroup.appendQueue(mediaInstance2); // queue up to be played back after above buffer
@hoch asked to drop it to the issue tracker for reference. Not sure how to tie in to Web Audio, but hopefully it gives ideas of the use cases.
Thanks @juj!
In the last F2F, WG/CG briefly chatted about WebCodec and the Stream/WebAudio integration. This proposal might be relevant to both directions.
Virtual F2F:
AudioBuffer
would work, but interaction with the rest of the API needs to be definedWebCodecs
goes a long way to help, but does not help if the decoded audio assets need to be present in memory at all time, for example in a sampler or any other advanced music app that need to be able to seek arbitrarily in real-time. As a data point, just halving the memory by inflating to float32 on mobile on Firefox make a nice difference, since there are so many assets.Teleconf: This is useful. @padenot mentioned that Firefox already does this internally and transparently. The question is if it should be exposed to the developer and what the API should look like. Proposals welcome.
TPAC 2020:
The two last point are complementary and don't serve the same use-case, I believe both would have their use.
cc @juj, @pmlt
Hey, thanks for the ping! I was not aware of TPAC, and missed out on that - but would love to join in a call if that would help the progress.
My take on raw 8-bit/16-bit vs 4-bit DPCM is that neither can obviate a need for the other. Both types of formats are used in native game projects, so I would vote to see support for both in Web Audio. (preference towards raw if only one had to be chosen)
TPAC 2020:
- There was nobody from the game industry in the call, so we couldn't make much concrete progress
- https://www.bitsnbites.eu/hiqh-quality-dpcm-attempts/ / https://github.com/mbitsnbites/libsac was discussed, as a way for authors to opt-in to in-memory compressed audio assets, to be able to have a rather high quantity of audio assets in memory while reducing the footprint by about 8x (f32 -> 4bits). This compression scheme is designed (amongst other things) for random access and real-time safety. Additionally, having a flag to allow a file to be stored by the engine in 16-bits could have its use, to have reduced footprint but lossless in-memory audio samples
- Web Codecs is happening. The audio decoding part is available in Chrome pre-release, behind a flag. This solves the problem of playing long audio streams without having the whole file resident in memory, but still having rather precise (=custom) playback
The two last point are complementary and don't serve the same use-case, I believe both would have their use.
cc @juj, @pmlt
To reference your second bullet point: choosing the overall bit depth of the audio context would significantly help my application's memory footprint. Being forced to use 32 bit floating point is maxing out my memory when I have 16+ long-form audio files loaded in.
Virtual F2F 2021: Increase this to priority-1. We will support additional depths for linear PCM (i.e. not 4-bit DPCM). Lots of details need to be worked out, but probably decodeAudioData
will return an AudioBuffer
with int16 for mp3/aac files. We want a way to be able to say new AudioBuffer(<options>)
to allow specifying the bit depth of the buffer and be able to get the bits out.
...but probably decodeAudioData will return an AudioBuffer with int16 for mp3/aac files...
@rtoy It's rare, but folks may want to decode these to 24-bit.
We want a way to be able to say new AudioBuffer(
) to allow specifying the bit depth of the buffer and be able to get the bits out.
Are you saying that the web app can specify the desired target bit depth, in cases where the original isn't known? If so, that sounds great. (For example, an MP3 encoder may take 24-bit PCM samples, and the decoder may be able to output 24-bit PCM samples, but as far as I understand it there is no inherent bit depth while in MP3-land. The web application could request 24-bit PCM if it wanted.)
Sorry. I really meant that for an encoded file, decodeAudioData can return a buffer of whatever the appropriate bit depth is if there is one. So a 24-bit wav file gets a 24-bit buffer. Well, I guess there isn't really a 24-bit array type, so it would probably be a 32-bit array type.
Are you saying that the web app can specify the desired target bit depth, in cases where the original isn't known? If so, that sounds great. (For example, an MP3 encoder may take 24-bit PCM samples, and the decoder may be able to output 24-bit PCM samples, but as far as I understand it there is no inherent bit depth while in MP3-land. The web application could request 24-bit PCM if it wanted.)
Ah, I'm not sure about that. I think we want to minimize the changes to decodeAudioData since WebCodecs can probably do everything better. So, I'm not sure about what you can specify for decodeAudioData. But certainly as a user, I want to be able to create an AudioBuffer manually with a specified bit depth. If nothing else, this is useful for testing that AudioBuffers behave correctly.
AudioWG call:
copyTo
audio samples to memory, soon with conversionAudioBuffer
that takes a format and a buffer, this can be quite flexible. The AudioBuffer
is then used as usual, and inflated to float32 as neededNext step - straw man and draft spec text required.
Using an integer 16-bit sample type instead of float32 would allow saving half of the memory on audio data when it's resident in RAM.
Consider adding support for users to utilize audio data in such formats.
Discussion thread about this is at http://lists.w3.org/Archives/Public/public-audio/2013OctDec/0294.html