Modernizing Aurora - Githubissues

devongovett commented 8 years ago

Aurora.js has been around since 2011, and since then the JS and web audio ecosystems have improved quite a bit. We got ES6, much better build tools, the Web Audio API was implemented cross browser, Node.js streams were invented, etc. Given these changes, I think it's time to modernize Aurora a bit. This is a proposal for the changes I'd like to make. Many of these would be breaking changes, so we would need to bump the major version.

Here's an overview:

ES6 - Convert the core code from CoffeeScript to ES6.
Streams - Use Node.js streams across the board to support back-pressure correctly.
Multitrack - Update our Demuxer class to support files with multiple tracks, e.g. video, audio, subtitles, etc.
Modularize - Extract parts of the core into smaller reusable modules that can be used by other projects.
Web Audio - Support more complex integrations with the Web Audio API.
ES6

I'd like to switch the codebase away from CoffeeScript, and convert it to ES6. CoffeeScript served us well, and paved the way for many of the features in ES6, but it hasn't been updated in a while and ES6 has largely superseded it. And with tools like Babel, we can use it everywhere without compatibility problems. Also, many more people are familiar with ES6/plain JS, so it will encourage more outside contribution. Therefore, I think it is time to move on from CoffeeScript.

Streams

When we started, Node.js was in its infancy, and real streams were not invented yet. We basically had event emitters. So, Aurora ended up building its own sort-of streams, which have problems. There is no back-pressure, so the source just reads as fast as possible. For large files, this means we buffer a lot into memory before it is needed.

New Node.js streams (introduced in node v0.10 around 2013) have support for back-pressure built-in, so when you pipe one stream to another, the source will automatically slow down or speed up depending on how fast the downstream readers are. They are also a standardized interface that many projects have adopted, so you can compose streams from different authors together very easily.

I'd like for Aurora.js to adopt Node streams across the board. This should be transparent for the browser as well, thanks to browserify. This means that many of the custom source classes that we have (e.g. file and http) can be removed since they exist in other projects (e.g. fs.createReadStream in node).

The one problem with Node streams, is that they do not support seeking out of the box. Once you start a stream, you cannot easily jump to another part of it. For our purposes, I think an extension to readable streams for sources to support seeking would work. When seeking, we would flush the internal buffers of the demuxers and decoders, and then seek the source.

Multitrack

The npm module is called av (maybe we should consider changing the github project name to match?) since aurora was taken, but it is perhaps a better name since we may want to support video one day. In preparation for that, I think the Demuxer classes should be refactored to support multi-track media, e.g. video, audio, subtitles, etc. Here's the interface I'm proposing:

Demuxer - the demuxer class would remain, but would become a writable stream rather than a transform stream. It would emit track events with Track objects when they become available. It would continue to emit metadata events as well, but everything else would become part of the track.
Track - the track class would be a readable stream representing a single track within the media (basically what Demuxer was before). Tracks would have a type (audio, video, etc.), format, duration, seek points, etc. as demuxers did before.
Decoder - same as before. It's a transform stream that decodes a single track.

Here's an example of how you might use the new interface to play an audio track:

fs.createReadStream('my.mp4')
  .pipe(new MP4Demuxer)
  .on('track', function(track) {
    if (track.type === 'audio') {
      track.pipe(new AACDecoder(track.format))
           .pipe(new WebAudioSink);
    } else {
      track.discard(); // throw away the data (don't buffer it)
    }
  });

Modularize

Aurora.js core is already pretty small, but it could be smaller, and pieces could be made reusable by other projects. Here is what I'm proposing:

Extract the core binary stream reading stuff into a separate module. This includes the BufferList, Stream, and Bitstream classes. I think the AV.Buffer class can be removed, and we can just use Uint8Arrays or node Buffers maybe.
Extract the built-in decoders and demuxers into separate modules. Users can require them just as they do other codecs.
Extract the sources as needed. Streaming HTTP and filesystem things are clearly useful to other projects. And they probably will get quite a bit simpler given that we can use node streams, and other projects have implemented most of what we need already (except seeking, see above).
Remove the filters. These were never really used, and balance and volume controls can be better done using the Web Audio API.
Move much of the audio device code to separate modules. See below for web audio, and node-speaker does much of what we need on Node.js. Remove the Mozilla audio device, since they've implemented the Web Audio API in Firefox a while ago now.

So what would be left in Aurora core?

Demuxer
Track
Decoder
Asset
Player
small source and device wrappers, as needed to support seeking, etc.
maybe Muxer, Encoder, etc. someday to support encoding...
Web Audio

Currently, Aurora.js uses the Web Audio API for playback in browsers, but it creates its own AudioContext, so it's hard to integrate with more complex setups where you want to do audio processing on the decoded output from Aurora. I'd like to make it possible to use Aurora as just another node in the Web Audio API graph. I propose splitting the current WebAudioDevice into two pieces:

WebAudioStream - a writable stream that acts as a web audio node that you can connect to other nodes in the graph. It would do the resampling and format conversions necessary to get raw data into web audio.
WebAudioSink - a thin wrapper around WebAudioStream that creates a shared context as needed, and simply outputs to the audio context's destination node.

Here's an example showing how you might connect a decoder to a web audio graph and do some further processing:

var context = new AudioContext;
var stream = new WebAudioStream(track.format);

var panNode = context.createStereoPanner();
panNode.pan.value = -0.5;

stream.connect(panNode);
panNode.connect(context.destination);

decoder.pipe(stream);

WebAudioStream could live in a separate module, since it might be useful to other projects.

Backward Compatibility

I'd like to cause as few changes as possible to the existing demuxers and decoders out there in the world. The interface to stream/bitstream reading would remain the same, and that's the biggest surface area used by plugins. It's pretty easy to switch from emitting data to writing to a track in the demuxers. And the decoders should work exactly the same way. We would get rid of the AV.Base class, which was our class abstraction for plain JS before, so rather than AV.Demuxer.extend, we'd either need to switch to using ES6 (preferred), or just use prototypes.

Conclusion

Overall, I think the changes I described above would modernize the framework quite a bit, and make it easier to use, and contribute to. It would also make the core considerably smaller, and make our code more easily reusable by other projects. This is obviously a large project, and it wouldn't be done overnight, but I think it's a good direction to go in. Please let me know what you think!

jussi-kalliokoski commented 8 years ago

Sounds good! Too bad whatwg/streams is not a very safe bet yet, but might be a good idea to keep the design constraints in mind anyway, even as we're targeting nodejs streams. WDYT about using flow for aurora.js(/av.js?) What about distribution? i.e. do we distribute a separate package for untranspiled JS? An UMD package?

Where should we start? Maybe sketch out a draft of how the public API would look like and nail that down, then start implementing? Regarding that, maybe WebAudioDevice -> WebAudioSink instead b/c device is somewhat misleading terminology we could get rid of while at it. ;)

devongovett commented 8 years ago

Yeah I thought about whatwg streams, but I don't think they're ready yet, and node has a much larger ecosystem of compatible streams already available. My guess is that once whatwg streams are done, there will be compatibility layers bridging the two anyway since they aren't that different (at least in my brief reading).

I don't have a strong opinion on flow or another type system for that matter (e.g. typescript). Willing to be persuaded.

For distribution, maybe use browserify (or rollup?) for the default build, and require('av/es6') for the es6 source files? Not sure. Rather not have two npm packages.

Agree on WebAudioSink instead of WebAudioDevice. Updated proposal.

I've started playing around with some of this already, actually :smile:. Currently working on extracting/updating the binary stream reading things. Have a proof of concept using node streams. Will publish to a branch soon. But yeah, let's work on a spec for the public API. Hopefully the high level stuff (e.g. player) doesn't have to change much.

jussi-kalliokoski commented 8 years ago

Yeah I thought about whatwg streams, but I don't think they're ready yet, and node has a much larger ecosystem of compatible streams already available.

Agreed.

I don't have a strong opinion on flow or another type system for that matter (e.g. typescript). Willing to be persuaded.

So far every time I've used flow there's been a crucial (for me) feature missing so I haven't used the type checker much, but I've found the annotations are great form of documentation, both for reading the code as well as generating documentation from the code. All that in mind, I think they're a low cost (especially given how simple it is to configure Babel to strip them) / medium value addition. If we were to actually use the type checker as well, I'd consider them high value even, and might be doable if we constrain ourselves otherwise to pure ES2015, although not sure how well flow plays together with emitting events and such.

For distribution, maybe use browserify (or rollup?) for the default build, and require('av/es6') for the es6 source files? Not sure. Rather not have two npm packages.

I personally like lodash's model of distribution where they have the lodash package that contains everything, and the default entry point is also a big module that contains all the things, but you can also import the stuff in it directly i.e. lodash/trim, or even from its own package, i.e. lodash.trim. I can help with building the tooling so we can do the same thing, if we want to. I can also help with other tooling like generating documentation, etc. :)

As for the separate package, it's a tradeoff I'm impartial to. However, it's worth noting that from the aurora/AV users' point of view, including the ES6 sources in the same npm package as the transpiled sources offer no benefit to users of either while increasing the package size for both.

I've started playing around with some of this already, actually 😄

💃

devongovett commented 8 years ago

Extracted the binary stream reading stuff into stream-reader. Mostly exactly the same as the code in aurora, but converted to ES6 using decaffeinate. Docs etc. coming.

The main change is that I dropped the AV.Buffer wrapper, and BufferList is just a linked list of Uint8Arrays. This is made possible by the use of ES6 symbols for the previous and next pointers, which make it possible for the buffers to be in more than one BufferList.

A couple features to propose:

Make BufferList a writable node stream, so you can pipe to it. This will have the effect of managing back pressure automatically when you read from a Stream wrapping the BufferList.
Add a maxTailBytes (not sure of the best name) option on BufferList to specify the number of bytes to keep in the list after advancing. Currently all buffers appended to the list remain in the list after you read them (to support rewinding). This can cause excessive memory use. There should be a limit to how far you can rewind by default, specified by the maxTailBytes option.

devongovett commented 8 years ago

Looks like the stream-reader name is already taken in npm, though it's a pretty old unmaintained library. So we'll either need a different name, or convince the guy to give us the package. :/

jussi-kalliokoski commented 8 years ago

We could also (try to, might already be reserved) use a namespace: @av/stream-reader - might be even better as stream reader is quite a generic name

devongovett commented 8 years ago

Well, it is a pretty generic module. That's why we're breaking it out. :smile:

jussi-kalliokoski commented 8 years ago

Generic yes, but the name could be more specific as it doesn't deal (directly) with the same Streams that node users would expect, and also it deals with a very specific type of streams (raw binary data).

devongovett commented 8 years ago

Pushed an initial implementation of the streams stuff to the v2 branch in b0c69cd69a97ffda85b42867b4f02e47096b7e86. Still lots to do, but please feel free to leave comments. A few notes:

I tried implementing Decoder as a Transform stream subclass, but it was less performant than I wanted. This was because transform streams expect all of the input data in each chunk to be decoded at once. Because compressed media formats output data a lot more quickly than they take it in, this resulted in a lot of packets being decoded at once and therefore a lot of buffering. In order to spread the decoding out better over time, I made a custom transform stream by inheriting from Duplex instead.
I renamed readChunk in the decoder to decodePacket, which is more descriptive. Currently it calls readChunk by default for backward compatibility.
The demuxers in the code were ported from coffeescript using decaffeinate. They are almost unmodified, except to switch from emitting data directly to writing it to tracks. Eventually they'll move to their own repos, but I needed to test things quickly for now.
Currently, aac.js on the sbr branch works with the new framework.

devongovett commented 8 years ago

Committed some more things. See here for a good comparison of everything so far (without the noise caused by removing all the existing code).

The main thing was refactoring Demuxer to move the common logic of dealing with streamed data to the base class, rather than individual demuxers. Most of the demuxers had a big while (stream.available(1)) loop, which is silly. Most media formats are structured in a series of discrete chunks, so it makes sense for the Demuxer#readChunk implementation to read a single chunk, rather than having to read all of the data available in the stream at once. Demuxer#readChunk will be called as necessary by the base class to consume data in the stream. If an underflow occurs, it will seek back to the last good offset, and try again when more data is available. This is the same behavior that decoders already have.

devongovett commented 8 years ago

Moved the mp4 demuxer here: https://github.com/audiocogs/mp4.js.

lukebarlow commented 8 years ago

Hi, just came across this thread and and just wanted to say it fits very closely with what I'm trying to do. I want to make a flexible multi-track audio component in the browser. As well as playing single files that contain multiple tracks, like the mp4 test in v2, I want to also want to handle the situation where we have one file per track, and be able to load and play them in a synchronised way. Do you have any plans for supporting this scenario?

I'm trying to decide whether to base my code on the master aurora branch, or this v2 one. I have some basic things working from the master branch, but the changes in v2 sound good. Has there been any progress since July, or any plans to work on it?

Am I correct in understanding that at this time, the v2 test only works in the node environment, primarily because you haven't settled on a stream implementation to use in the browser?

devongovett commented 8 years ago

@lukebarlow yeah I want to finish this, but I'm super busy and have a lot of projects I'm working on at the moment. In the browser, we'll use node streams as provided by browserify. I had started on a WebAudioSink in 40d1c9299e3ef72a60013c335df8a83f67462d04.

As for synchronizing multiple files (or even multiple tracks), I hadn't started on anything yet. That would probably be done by the Player class, or perhaps an intermediary. It needs to do things like handle tracks of varying sample rates and media types (e.g. sync video with audio).

lukebarlow commented 8 years ago

Okay, no problem. Thanks for the speedy reply. Do you have any kind of test code which shows the WebAudioSink in action?

filerun commented 7 years ago

+1 for ES6. I would contribute to this project if it wasn't for the CoffeeStuff.

altaywtf commented 7 years ago

do you guys have any timeline for this?

chrisbenincasa commented 7 years ago

👍 Been following this project for a while and would happy to contribute in any capacity to this effort.

MatthewCallis commented 7 years ago

I've ported most everything to ES6 but also made several changes to the structure so I'm not sure how useful it would be moving forward, 3 tests still failing but I'm working on those and to get it cleaned up and back to working order.

https://github.com/MatthewCallis/aurora.js/tree/ES6

lukebarlow commented 7 years ago

Some other decaffienated aurora efforts here - https://github.com/alexanderwallin/multitrack-audio-element/tree/master/.idea/decaffeinated-aurora

MatthewCallis commented 7 years ago

Everything is passing now (except one odd M4A chapter test) and I've begun expanding the code coverage at this point.

devongovett commented 7 years ago

@MatthewCallis interesting. I'd like to do more of a refactor here rather than a straight port to ES6, but that's good for now. Not sure when I'll have time.

MatthewCallis commented 7 years ago

@devongovett cool! I did it to learn more about the code base and my own codec was not like anything I had seen tackled yet (tracked music in proprietary formats) and what was needed to facilitate that easier. Here to help if you need it!

audiocogs / aurora.js

Modernizing Aurora #170

ES6

Streams

Multitrack

Modularize

Web Audio

Backward Compatibility

Conclusion