audiocogs / aurora.js

JavaScript audio decoding framework
http://audiocogs.org/codecs
1.26k stars 185 forks source link

Modernizing Aurora #170

Open devongovett opened 8 years ago

devongovett commented 8 years ago

Aurora.js has been around since 2011, and since then the JS and web audio ecosystems have improved quite a bit. We got ES6, much better build tools, the Web Audio API was implemented cross browser, Node.js streams were invented, etc. Given these changes, I think it's time to modernize Aurora a bit. This is a proposal for the changes I'd like to make. Many of these would be breaking changes, so we would need to bump the major version.

Here's an overview:

I'd like to switch the codebase away from CoffeeScript, and convert it to ES6. CoffeeScript served us well, and paved the way for many of the features in ES6, but it hasn't been updated in a while and ES6 has largely superseded it. And with tools like Babel, we can use it everywhere without compatibility problems. Also, many more people are familiar with ES6/plain JS, so it will encourage more outside contribution. Therefore, I think it is time to move on from CoffeeScript.

Streams

When we started, Node.js was in its infancy, and real streams were not invented yet. We basically had event emitters. So, Aurora ended up building its own sort-of streams, which have problems. There is no back-pressure, so the source just reads as fast as possible. For large files, this means we buffer a lot into memory before it is needed.

New Node.js streams (introduced in node v0.10 around 2013) have support for back-pressure built-in, so when you pipe one stream to another, the source will automatically slow down or speed up depending on how fast the downstream readers are. They are also a standardized interface that many projects have adopted, so you can compose streams from different authors together very easily.

I'd like for Aurora.js to adopt Node streams across the board. This should be transparent for the browser as well, thanks to browserify. This means that many of the custom source classes that we have (e.g. file and http) can be removed since they exist in other projects (e.g. fs.createReadStream in node).

The one problem with Node streams, is that they do not support seeking out of the box. Once you start a stream, you cannot easily jump to another part of it. For our purposes, I think an extension to readable streams for sources to support seeking would work. When seeking, we would flush the internal buffers of the demuxers and decoders, and then seek the source.

Multitrack

The npm module is called av (maybe we should consider changing the github project name to match?) since aurora was taken, but it is perhaps a better name since we may want to support video one day. In preparation for that, I think the Demuxer classes should be refactored to support multi-track media, e.g. video, audio, subtitles, etc. Here's the interface I'm proposing:

Here's an example of how you might use the new interface to play an audio track:

fs.createReadStream('my.mp4')
  .pipe(new MP4Demuxer)
  .on('track', function(track) {
    if (track.type === 'audio') {
      track.pipe(new AACDecoder(track.format))
           .pipe(new WebAudioSink);
    } else {
      track.discard(); // throw away the data (don't buffer it)
    }
  });

Modularize

Aurora.js core is already pretty small, but it could be smaller, and pieces could be made reusable by other projects. Here is what I'm proposing:

So what would be left in Aurora core?

Currently, Aurora.js uses the Web Audio API for playback in browsers, but it creates its own AudioContext, so it's hard to integrate with more complex setups where you want to do audio processing on the decoded output from Aurora. I'd like to make it possible to use Aurora as just another node in the Web Audio API graph. I propose splitting the current WebAudioDevice into two pieces:

Here's an example showing how you might connect a decoder to a web audio graph and do some further processing:

var context = new AudioContext;
var stream = new WebAudioStream(track.format);

var panNode = context.createStereoPanner();
panNode.pan.value = -0.5;

stream.connect(panNode);
panNode.connect(context.destination);

decoder.pipe(stream);

WebAudioStream could live in a separate module, since it might be useful to other projects.

Backward Compatibility

I'd like to cause as few changes as possible to the existing demuxers and decoders out there in the world. The interface to stream/bitstream reading would remain the same, and that's the biggest surface area used by plugins. It's pretty easy to switch from emitting data to writing to a track in the demuxers. And the decoders should work exactly the same way. We would get rid of the AV.Base class, which was our class abstraction for plain JS before, so rather than AV.Demuxer.extend, we'd either need to switch to using ES6 (preferred), or just use prototypes.

Conclusion

Overall, I think the changes I described above would modernize the framework quite a bit, and make it easier to use, and contribute to. It would also make the core considerably smaller, and make our code more easily reusable by other projects. This is obviously a large project, and it wouldn't be done overnight, but I think it's a good direction to go in. Please let me know what you think!

jussi-kalliokoski commented 8 years ago

Sounds good! Too bad whatwg/streams is not a very safe bet yet, but might be a good idea to keep the design constraints in mind anyway, even as we're targeting nodejs streams. WDYT about using flow for aurora.js(/av.js?) What about distribution? i.e. do we distribute a separate package for untranspiled JS? An UMD package?

Where should we start? Maybe sketch out a draft of how the public API would look like and nail that down, then start implementing? Regarding that, maybe WebAudioDevice -> WebAudioSink instead b/c device is somewhat misleading terminology we could get rid of while at it. ;)

devongovett commented 8 years ago

Yeah I thought about whatwg streams, but I don't think they're ready yet, and node has a much larger ecosystem of compatible streams already available. My guess is that once whatwg streams are done, there will be compatibility layers bridging the two anyway since they aren't that different (at least in my brief reading).

I don't have a strong opinion on flow or another type system for that matter (e.g. typescript). Willing to be persuaded.

For distribution, maybe use browserify (or rollup?) for the default build, and require('av/es6') for the es6 source files? Not sure. Rather not have two npm packages.

Agree on WebAudioSink instead of WebAudioDevice. Updated proposal.

I've started playing around with some of this already, actually :smile:. Currently working on extracting/updating the binary stream reading things. Have a proof of concept using node streams. Will publish to a branch soon. But yeah, let's work on a spec for the public API. Hopefully the high level stuff (e.g. player) doesn't have to change much.

jussi-kalliokoski commented 8 years ago

Yeah I thought about whatwg streams, but I don't think they're ready yet, and node has a much larger ecosystem of compatible streams already available.

Agreed.

I don't have a strong opinion on flow or another type system for that matter (e.g. typescript). Willing to be persuaded.

So far every time I've used flow there's been a crucial (for me) feature missing so I haven't used the type checker much, but I've found the annotations are great form of documentation, both for reading the code as well as generating documentation from the code. All that in mind, I think they're a low cost (especially given how simple it is to configure Babel to strip them) / medium value addition. If we were to actually use the type checker as well, I'd consider them high value even, and might be doable if we constrain ourselves otherwise to pure ES2015, although not sure how well flow plays together with emitting events and such.

For distribution, maybe use browserify (or rollup?) for the default build, and require('av/es6') for the es6 source files? Not sure. Rather not have two npm packages.

I personally like lodash's model of distribution where they have the lodash package that contains everything, and the default entry point is also a big module that contains all the things, but you can also import the stuff in it directly i.e. lodash/trim, or even from its own package, i.e. lodash.trim. I can help with building the tooling so we can do the same thing, if we want to. I can also help with other tooling like generating documentation, etc. :)

As for the separate package, it's a tradeoff I'm impartial to. However, it's worth noting that from the aurora/AV users' point of view, including the ES6 sources in the same npm package as the transpiled sources offer no benefit to users of either while increasing the package size for both.

I've started playing around with some of this already, actually 😄

💃

devongovett commented 8 years ago

Extracted the binary stream reading stuff into stream-reader. Mostly exactly the same as the code in aurora, but converted to ES6 using decaffeinate. Docs etc. coming.

The main change is that I dropped the AV.Buffer wrapper, and BufferList is just a linked list of Uint8Arrays. This is made possible by the use of ES6 symbols for the previous and next pointers, which make it possible for the buffers to be in more than one BufferList.

A couple features to propose:

devongovett commented 8 years ago

Looks like the stream-reader name is already taken in npm, though it's a pretty old unmaintained library. So we'll either need a different name, or convince the guy to give us the package. :/

jussi-kalliokoski commented 8 years ago

We could also (try to, might already be reserved) use a namespace: @av/stream-reader - might be even better as stream reader is quite a generic name

devongovett commented 8 years ago

Well, it is a pretty generic module. That's why we're breaking it out. :smile:

jussi-kalliokoski commented 8 years ago

Generic yes, but the name could be more specific as it doesn't deal (directly) with the same Streams that node users would expect, and also it deals with a very specific type of streams (raw binary data).

devongovett commented 8 years ago

Pushed an initial implementation of the streams stuff to the v2 branch in b0c69cd69a97ffda85b42867b4f02e47096b7e86. Still lots to do, but please feel free to leave comments. A few notes:

devongovett commented 8 years ago

Committed some more things. See here for a good comparison of everything so far (without the noise caused by removing all the existing code).

The main thing was refactoring Demuxer to move the common logic of dealing with streamed data to the base class, rather than individual demuxers. Most of the demuxers had a big while (stream.available(1)) loop, which is silly. Most media formats are structured in a series of discrete chunks, so it makes sense for the Demuxer#readChunk implementation to read a single chunk, rather than having to read all of the data available in the stream at once. Demuxer#readChunk will be called as necessary by the base class to consume data in the stream. If an underflow occurs, it will seek back to the last good offset, and try again when more data is available. This is the same behavior that decoders already have.

devongovett commented 8 years ago

Moved the mp4 demuxer here: https://github.com/audiocogs/mp4.js.

lukebarlow commented 8 years ago

Hi, just came across this thread and and just wanted to say it fits very closely with what I'm trying to do. I want to make a flexible multi-track audio component in the browser. As well as playing single files that contain multiple tracks, like the mp4 test in v2, I want to also want to handle the situation where we have one file per track, and be able to load and play them in a synchronised way. Do you have any plans for supporting this scenario?

I'm trying to decide whether to base my code on the master aurora branch, or this v2 one. I have some basic things working from the master branch, but the changes in v2 sound good. Has there been any progress since July, or any plans to work on it?

Am I correct in understanding that at this time, the v2 test only works in the node environment, primarily because you haven't settled on a stream implementation to use in the browser?

devongovett commented 8 years ago

@lukebarlow yeah I want to finish this, but I'm super busy and have a lot of projects I'm working on at the moment. In the browser, we'll use node streams as provided by browserify. I had started on a WebAudioSink in 40d1c9299e3ef72a60013c335df8a83f67462d04.

As for synchronizing multiple files (or even multiple tracks), I hadn't started on anything yet. That would probably be done by the Player class, or perhaps an intermediary. It needs to do things like handle tracks of varying sample rates and media types (e.g. sync video with audio).

lukebarlow commented 8 years ago

Okay, no problem. Thanks for the speedy reply. Do you have any kind of test code which shows the WebAudioSink in action?

filerun commented 7 years ago

+1 for ES6. I would contribute to this project if it wasn't for the CoffeeStuff.

altaywtf commented 7 years ago

do you guys have any timeline for this?

chrisbenincasa commented 7 years ago

👍 Been following this project for a while and would happy to contribute in any capacity to this effort.

MatthewCallis commented 7 years ago

I've ported most everything to ES6 but also made several changes to the structure so I'm not sure how useful it would be moving forward, 3 tests still failing but I'm working on those and to get it cleaned up and back to working order.

https://github.com/MatthewCallis/aurora.js/tree/ES6

lukebarlow commented 7 years ago

Some other decaffienated aurora efforts here - https://github.com/alexanderwallin/multitrack-audio-element/tree/master/.idea/decaffeinated-aurora

MatthewCallis commented 7 years ago

Everything is passing now (except one odd M4A chapter test) and I've begun expanding the code coverage at this point.

devongovett commented 7 years ago

@MatthewCallis interesting. I'd like to do more of a refactor here rather than a straight port to ES6, but that's good for now. Not sure when I'll have time.

MatthewCallis commented 7 years ago

@devongovett cool! I did it to learn more about the code base and my own codec was not like anything I had seen tackled yet (tracked music in proprietary formats) and what was needed to facilitate that easier. Here to help if you need it!