Open mitchmindtree opened 7 years ago
@mitchmindtree
some thoughts on/ideas for the API so far:
audio::read::Description
could also include info on the format used (vorbis, flac, wav)
I think instead of focusing on files, the API of the read
module should focus on implementors of the Read
trait from stdlib instead. This allows users to read from simple in memory buffers via io::Cursor
and provide their own plumbing for things like files that are in the process of being downloaded via http(s). Note, it doesn't have to take references, as AsRef<Read>
automatically impl's Read
(an error I did originally with lewton, highly complicating the API, but its now fixed). Also, you still can have a convenience function that takes a path instead of a Read
. From looking at the three dependencies, it seems possible to implement this.
Additionally it would be cool if we could expose some per-frame or per-packet decoders for each of the formats. Then people can use our crate together with custom container formats, e.g. flac inside mp4, or vorbis inside webm. Maybe this should be a responsibility of the upstream crates instead though, so dunno.
Also further ideas:
provide some zero cost abstraction over interleaved / non interleaved channel info. e.g. for 2 channel audio you can either read tuples of two or one sample for the right and one for the left channel.
magic based format recognition. atm we only recognize formats by file extension. This is not optimal. E.g. .ogg files can be both opus, and vorbis (although its used more rarely for opus).
seek support. however, i think seeking is non trivial :/. Also it requires cooperation of the upstream crates.
Thanks for including me in the discussion, I’d be happy to help. A few thoughts:
image
crate. If I understand correctly, the purpose of this crate is to provide a convenient API for the “I don’t care how you decode this file, please just give me the samples / I don’t care how you encode these samples, but please just write me a file in format X” case. Performance must come second then.u32
or f32
from the start. I have never encountered an audio file with more than 32 bits per sample in practice. (It would very likely be overkill anyway, though there might be valid reasons to have such a file.)I think instead of focusing on files, the API of the read module should focus on implementors of the
Read
trait from stdlib instead.
I agree. Also, don’t wrap it in a BufReader
internally: if the Read
happens to be buffered already, that is only doing wasteful extra copies. What I did in Hound and Claxon was to expose a new
constructor that takes a Read
, and a convenience open
constructor that takes an AsRef<Path>
. The latter also wraps the file in a BufReader
.
seek support. however, i think seeking is non trivial :/. Also it requires cooperation of the upstream crates.
I want to add seek support to Claxon and Hound. I was thinking of adding it to Hound first because for wav it is easy, and then try to experiment there what kind of API works well.
If I understand correctly, the purpose of this crate is to provide a convenient API for the “I don’t care how you decode this file, please just give me the samples / I don’t care how you encode these samples, but please just write me a file in format X” case. Performance must come second then.
@ruuda yes this is the general direction I had in mind for the Reader::samples
API. That said, the user can still access the underlying reader if they want as-good-as-possible performance for each format, e.g.
match try!(audio::Reader::new(reader)) {
audio::Reader::Wav(wav_reader) => // Efficiently decode WAV
audio::Reader::Flac(flac_reader) => // Efficiently decode FLAC
// etc
}
On a related note, I was thinking that it might be worth adding a fn buffered(buffer_len: usize) -> BufferedSamples<R>
method to the read::Samples
iterator to offer the ability to improve performance by reducing the amount of branching that occurs (e.g. only branch on the format once per buffer_len
samples rather than every sample). Constructing it might look something like this reader.samples().buffered(512)
. It would probably be worth benchmarking this first, as the branching may get optimised away in most cases anyway.
I recently optimised Claxon and the writer in Hound to make Flac to wav decoding as fast as I possible. One of the things I found out is that the API can have a huge impact on performance.
Interesting, I'd be keen to hear you elaborate on this a little more if you get a chance. The conversion scheme I currently have in mind is to simply read into PCM samples as an intermediary and write to the target format using those - perhaps it is worth leaving room to specialise conversions between certain formats that can be optimised? Performance is not a top priority in my mind at present, but I think it is worth keeping in mind to avoid boxing ourselves into an unnecessarily bottle-necked API.
On a related note, I was thinking that it might be worth adding a
fn buffered(buffer_len: usize) -> BufferedSamples<R>
method to theread::Samples
iterator to offer the ability to improve performance by reducing the amount of branching that occurs.
I’m not sure about this one. It could make sense, but it might be a premature optimisation. It depends on the intended use case of this crate, I guess.
Interesting, I'd be keen to hear you elaborate on this a little more if you get a chance.
For one, the easiest way to expose decoded samples to a user (in my opinion) is to provide an iterator. But as a decoder needs to occasionally do some IO to read, or decode something which may fail, this needs to be an iterator of Result<Sample>
. This requires the consumer to check the result for every sample, even though when buffering is used, producing the vast majority of samples does not involve an operation that may fail.
Also when writing, having to dispatch on the sample type can incur a significant cost. For Hound I did something quite similar to what you suggest for a buffered reader, but for writing, see get_i16_writer
.
Perhaps it is worth leaving room to specialise conversions between certain formats that can be optimised
My feeling is that this could complicate things a lot for very little benefit; if you want to do something like that, directly using the underlying encoders/decoders might be a better fit.
I think we need to split the Reader
enum into container and payload format. E.g. ogg can encode flac, opus, and vorbis, flac can be inside its own framing, and alac can be in mp4 as well as caf containers. Same for AAC and mp1/2/3. We can't make a variant for each single combination, much rather should we split it up and allow more combinations
This split would also be needed if we want to enable container conversion in the future, e.g. extracting audio from webm into an ogg file or sth, without actual decoding of the audio itself.
Splitting up containers and payloads does sound like it could make sense.
Will this complicate the format feature gating much? e.g. would we have have the flac
feature automatically enable all possible container types that it may exist in?
Will this require coordination between upstream crates? e.g. claxon
decodes FLAC
, but knows nothing of the ogg
crate - would this affect our ability to make an Ogg
container enum with a Flac
variant using a claxon
FLAC reader? If it did require upstream coordination, I wonder how much this would complicate things and if we'd start hitting semver hell due to diamond dependencies.
This split would also be needed if we want to enable container conversion in the future, e.g. extracting audio from webm into an ogg file or sth, without actual decoding of the audio itself.
This does sound useful, though almost sounds more like a job for an ogg
crate? I could be wrong though! I imagine something like this would require some ogg-specific code - how much more work are we opting into when we start supporting container specific conversions for all the container types out there, each with their own unique upstream crate and API? Perhaps we're better off keeping things simple for now by just supporting converting via reading and writing samples until we (or some users) come across an immediate use-case, and then consider where these sorts of conversions should be supported?
Will this complicate the format feature gating much?
One possible way to approach this would be to gate for each upstream crate, both container crates and codec ones. Crates like flac can act both as codec and container crate (as flac is self-contained).
claxon decodes FLAC, but knows nothing of the ogg crate
Yes, but that should be no problem. Ideally, each codec crate supported a generic per-packet/frame API that you can use to add container support yourself. E.g. lewton's inside_ogg
module uses only parts of lewton's public API (mostly the read_audio_packet and the header parsing functions). You could write this module in external crates just as well. The alac crate's API solely consists of such a per-frame functionality. So it does require upstream cooperation by them making available such a low-level API, but once its there we don't need any help for additional containers, and don't run into any problems with different container crate versions (even if we did, I think its no problem as cargo resolves this automatically by including a crate multiple times).
Hello,
I'm currently giving Nannou a go and really liking the framework. I thought I'd give some feedback on the audrey audio library. Looking through these thoughts, I'd like to add that unwinding the api to load from a file would be really nice.
I'm currently trying to build a drum machine type system in this framework and having to load all the samples from a buffered file seems like a waste and a problem for performance since I should just load the file data once and store it in memory for the lifetime of the app.
If unwinding this is tricky, then at least providing an additional API to read from an in-memory buffer would possibly do the trick.
If there are any suggestions as a workaround in the meantime I would appreciate some direction as well. I currently don't see a straightforward way to do what I want.
Thanks in advance for the great work!
@est31 ping! These are humble beginnings, still lots to go. I thought it could be good to upload it sooner rather than later to get your thoughts and feedback on the direction of the API etc. I'll start posting some issues for todo-items I have in mind soon.
@ruuda so far this crate is 50% simply wrapping your awesome work! I thought I'd ping you in case you'd like to follow along or in case you'd be interested in contributing at any point. The API design is largely inspired by your design in
hound
andclaxon
- it would be great to get your thoughts if you get a chance. Would be more than happy to invite you as an admin if you felt inclined!@tomaka I thought you might be interested in using this for rodio, perhaps as a replacement the decoder module? I think you are also still an admin of rustaudio, so feel free to open issues/PRs or give your thoughts on the direction of the crate if you are interested at all :+1: