Review / Direction / Thoughts

RustAudio / audrey

A crate to simplify reading, writing and converting between a variety of audio formats.

Other

132 stars 16 forks source link

Review / Direction / Thoughts #1

Open mitchmindtree opened 7 years ago

mitchmindtree commented 7 years ago

@est31 ping! These are humble beginnings, still lots to go. I thought it could be good to upload it sooner rather than later to get your thoughts and feedback on the direction of the API etc. I'll start posting some issues for todo-items I have in mind soon.

@ruuda so far this crate is 50% simply wrapping your awesome work! I thought I'd ping you in case you'd like to follow along or in case you'd be interested in contributing at any point. The API design is largely inspired by your design in hound and claxon - it would be great to get your thoughts if you get a chance. Would be more than happy to invite you as an admin if you felt inclined!

@tomaka I thought you might be interested in using this for rodio, perhaps as a replacement the decoder module? I think you are also still an admin of rustaudio, so feel free to open issues/PRs or give your thoughts on the direction of the crate if you are interested at all :+1:

est31 commented 7 years ago

@mitchmindtree

some thoughts on/ideas for the API so far:

audio::read::Description could also include info on the format used (vorbis, flac, wav)
I think instead of focusing on files, the API of the read module should focus on implementors of the Read trait from stdlib instead. This allows users to read from simple in memory buffers via io::Cursor and provide their own plumbing for things like files that are in the process of being downloaded via http(s). Note, it doesn't have to take references, as AsRef<Read> automatically impl's Read (an error I did originally with lewton, highly complicating the API, but its now fixed). Also, you still can have a convenience function that takes a path instead of a Read. From looking at the three dependencies, it seems possible to implement this.
Additionally it would be cool if we could expose some per-frame or per-packet decoders for each of the formats. Then people can use our crate together with custom container formats, e.g. flac inside mp4, or vorbis inside webm. Maybe this should be a responsibility of the upstream crates instead though, so dunno.

est31 commented 7 years ago

Also further ideas:

provide some zero cost abstraction over interleaved / non interleaved channel info. e.g. for 2 channel audio you can either read tuples of two or one sample for the right and one for the left channel.
magic based format recognition. atm we only recognize formats by file extension. This is not optimal. E.g. .ogg files can be both opus, and vorbis (although its used more rarely for opus).
seek support. however, i think seeking is non trivial :/. Also it requires cooperation of the upstream crates.

ruuda commented 7 years ago

Thanks for including me in the discussion, I’d be happy to help. A few thoughts:

There is a tradeoff between exposing a unified API for every format, and achieving maximum performance. For example, Flac stores channel data separately whereas wav interleaves samples. In the worst case this means you need to shuffle data around twice. We saw a similar thing with the image crate. If I understand correctly, the purpose of this crate is to provide a convenient API for the “I don’t care how you decode this file, please just give me the samples / I don’t care how you encode these samples, but please just write me a file in format X” case. Performance must come second then.
I recently optimised Claxon and the writer in Hound to make Flac to wav decoding as fast as I possible. One of the things I found out is that the API can have a huge impact on performance.
Being able to decode into all integer types might not be worth the trouble, especially if a simple API is more important than performance. When you consume such an API, in the end you end up with a match statement to match on the sample type, that pretty much does the same thing in every arm. You might as well read everything into u32 or f32 from the start. I have never encountered an audio file with more than 32 bits per sample in practice. (It would very likely be overkill anyway, though there might be valid reasons to have such a file.)
I made some pretty invasive changes to the Claxon API which is on master. I recommend to depend on that one for now; I intend to release a 0.3.0 soon-ish.

I think instead of focusing on files, the API of the read module should focus on implementors of the Read trait from stdlib instead.

I agree. Also, don’t wrap it in a BufReader internally: if the Read happens to be buffered already, that is only doing wasteful extra copies. What I did in Hound and Claxon was to expose a new constructor that takes a Read, and a convenience open constructor that takes an AsRef<Path>. The latter also wraps the file in a BufReader.

seek support. however, i think seeking is non trivial :/. Also it requires cooperation of the upstream crates.

I want to add seek support to Claxon and Hound. I was thinking of adding it to Hound first because for wav it is easy, and then try to experiment there what kind of API works well.

mitchmindtree commented 7 years ago

If I understand correctly, the purpose of this crate is to provide a convenient API for the “I don’t care how you decode this file, please just give me the samples / I don’t care how you encode these samples, but please just write me a file in format X” case. Performance must come second then.

@ruuda yes this is the general direction I had in mind for the Reader::samples API. That said, the user can still access the underlying reader if they want as-good-as-possible performance for each format, e.g.

match try!(audio::Reader::new(reader)) {
    audio::Reader::Wav(wav_reader) => // Efficiently decode WAV
    audio::Reader::Flac(flac_reader) => // Efficiently decode FLAC
    // etc
}

On a related note, I was thinking that it might be worth adding a fn buffered(buffer_len: usize) -> BufferedSamples<R> method to the read::Samples iterator to offer the ability to improve performance by reducing the amount of branching that occurs (e.g. only branch on the format once per buffer_len samples rather than every sample). Constructing it might look something like this reader.samples().buffered(512). It would probably be worth benchmarking this first, as the branching may get optimised away in most cases anyway.

I recently optimised Claxon and the writer in Hound to make Flac to wav decoding as fast as I possible. One of the things I found out is that the API can have a huge impact on performance.

Interesting, I'd be keen to hear you elaborate on this a little more if you get a chance. The conversion scheme I currently have in mind is to simply read into PCM samples as an intermediary and write to the target format using those - perhaps it is worth leaving room to specialise conversions between certain formats that can be optimised? Performance is not a top priority in my mind at present, but I think it is worth keeping in mind to avoid boxing ourselves into an unnecessarily bottle-necked API.

ruuda commented 7 years ago

On a related note, I was thinking that it might be worth adding a fn buffered(buffer_len: usize) -> BufferedSamples<R> method to the read::Samples iterator to offer the ability to improve performance by reducing the amount of branching that occurs.

I’m not sure about this one. It could make sense, but it might be a premature optimisation. It depends on the intended use case of this crate, I guess.

Interesting, I'd be keen to hear you elaborate on this a little more if you get a chance.

For one, the easiest way to expose decoded samples to a user (in my opinion) is to provide an iterator. But as a decoder needs to occasionally do some IO to read, or decode something which may fail, this needs to be an iterator of Result<Sample>. This requires the consumer to check the result for every sample, even though when buffering is used, producing the vast majority of samples does not involve an operation that may fail.

Also when writing, having to dispatch on the sample type can incur a significant cost. For Hound I did something quite similar to what you suggest for a buffered reader, but for writing, see get_i16_writer.

Perhaps it is worth leaving room to specialise conversions between certain formats that can be optimised

My feeling is that this could complicate things a lot for very little benefit; if you want to do something like that, directly using the underlying encoders/decoders might be a better fit.

est31 commented 7 years ago

I think we need to split the Reader enum into container and payload format. E.g. ogg can encode flac, opus, and vorbis, flac can be inside its own framing, and alac can be in mp4 as well as caf containers. Same for AAC and mp1/2/3. We can't make a variant for each single combination, much rather should we split it up and allow more combinations

est31 commented 7 years ago

This split would also be needed if we want to enable container conversion in the future, e.g. extracting audio from webm into an ogg file or sth, without actual decoding of the audio itself.

mitchmindtree commented 7 years ago

Splitting up containers and payloads does sound like it could make sense.

Will this complicate the format feature gating much? e.g. would we have have the flac feature automatically enable all possible container types that it may exist in?

Will this require coordination between upstream crates? e.g. claxon decodes FLAC, but knows nothing of the ogg crate - would this affect our ability to make an Ogg container enum with a Flac variant using a claxon FLAC reader? If it did require upstream coordination, I wonder how much this would complicate things and if we'd start hitting semver hell due to diamond dependencies.

This split would also be needed if we want to enable container conversion in the future, e.g. extracting audio from webm into an ogg file or sth, without actual decoding of the audio itself.

This does sound useful, though almost sounds more like a job for an ogg crate? I could be wrong though! I imagine something like this would require some ogg-specific code - how much more work are we opting into when we start supporting container specific conversions for all the container types out there, each with their own unique upstream crate and API? Perhaps we're better off keeping things simple for now by just supporting converting via reading and writing samples until we (or some users) come across an immediate use-case, and then consider where these sorts of conversions should be supported?

est31 commented 7 years ago

Will this complicate the format feature gating much?

One possible way to approach this would be to gate for each upstream crate, both container crates and codec ones. Crates like flac can act both as codec and container crate (as flac is self-contained).

claxon decodes FLAC, but knows nothing of the ogg crate

Yes, but that should be no problem. Ideally, each codec crate supported a generic per-packet/frame API that you can use to add container support yourself. E.g. lewton's inside_ogg module uses only parts of lewton's public API (mostly the read_audio_packet and the header parsing functions). You could write this module in external crates just as well. The alac crate's API solely consists of such a per-frame functionality. So it does require upstream cooperation by them making available such a low-level API, but once its there we don't need any help for additional containers, and don't run into any problems with different container crate versions (even if we did, I think its no problem as cargo resolves this automatically by including a crate multiple times).

deckarep commented 3 years ago

Hello,

I'm currently giving Nannou a go and really liking the framework. I thought I'd give some feedback on the audrey audio library. Looking through these thoughts, I'd like to add that unwinding the api to load from a file would be really nice.

I'm currently trying to build a drum machine type system in this framework and having to load all the samples from a buffered file seems like a waste and a problem for performance since I should just load the file data once and store it in memory for the lifetime of the app.

If unwinding this is tricky, then at least providing an additional API to read from an in-memory buffer would possibly do the trick.

If there are any suggestions as a workaround in the meantime I would appreciate some direction as well. I currently don't see a straightforward way to do what I want.

Thanks in advance for the great work!