AuburnSounds / audio-formats

Audio file decoding in pure D, no link dependency.
34 stars 3 forks source link

Allow partially decoding audio & seeking at a sample level #15

Open LunaTheFoxgirl opened 3 years ago

LunaTheFoxgirl commented 3 years ago

I'm looking in to using this in my game engine where audio may be minutes long, and audio may loop as well. Looking in to it this library does not seem to support seeking + reading specific sample counts at a time? For performance reasons my engine reads 16384 samples at a time, can flush those samples immediately and seek to a specifc sample before continuing playback. Currently I'm doing this through libvorbisfile.

Would it be possible for this library to eventually support such?

p0nce commented 3 years ago

Indeed #6 is not done yet. I've not investigated yet about how easy or not seeking would be. There is also game-mixer library who does the playback thing with music loops, threaded streaming and resampling, but still no seeking. Eventually I'd like to do it but I can't give you a time for that.

p0nce commented 3 years ago

Hello, implementing #6 turns out to be harder than expected, especially OGG which would need a stb_vorbis.c rewrite once again.

I feel that game-mixer would maybe be a better library to depend upon as it manages the whole "game mixer" thing, and as it decodes source in buffers it is possible to play them all at once an arbitrary number of times, fade music, etc. Just using audio-formats will give you the unresolved issues of resampling, file-reading that cannot be done from the audio thread, chunked allocation for decoded source etc. it is also quite efficient and has fade ins/out.

The API is there: https://github.com/AuburnSounds/game-mixer/blob/main/source/gamemixer/mixer.d#L90 What it doesn't do is non-Linux, non-Windows since that is a libsoundio-d limitation.

Anyway I'd be interested in the reasons people would prefer to depend on audio-formats instead ; it was really meant so that people that make a game engine would use it.

LunaTheFoxgirl commented 3 years ago

I already have my own mixing solution in my engine and don't intend on replacing it. There's a lot of underlying game related stuff my mixer supports; stuff like muffling audio for effects like when you're under water, 3D positional audio (with optional doppler effect), arbitrarily slowing or speeding up audio (for stuff like tape stop effects), etc.

My mixing solution is as well built on OpenAL which is more widely supported across platforms.

p0nce commented 3 years ago

For now you can seek with inpustream.seekPosition(frame); but only WAV / MP3 / FLAC / OPUS, I'm trying to get OGG too.

reading specific sample counts at a time?

If the stream ends you will get less samples returned, and then you have to fill the rest of your fixed-sized buffer with zeroes.

    /// When that returned number is less than `frames`, it means the stream is done decoding, or that there was a decoding error.
    int readSamplesFloat(float* outData, int frames) @nogc
LunaTheFoxgirl commented 3 years ago

That or simply seek back to the beginning and begin reading parts from the start in again to fill the rest of the space in the buffers.

Though what is a frame in this context? A direct sample index or something more vague? Like say if I read 1024 frames, will I get 1024 samples? (512 samples for left and right channel if stereo?)

p0nce commented 3 years ago

A "frame" in the audio-formats context = one sample for each channel. For a stereo sample, 1 frame is 1 sample For a stereo sample, 1 frame is 2 samples (left and right)

frames / samplerate = time in seconds

readSamplesFloat actually gives frames * _channels samples in the nominal case, and the return value is the number of frames.

Now I see the documentation is a bit confusing.

Like say if I read 1024 frames, will I get 1024 samples? (512 samples for left and right channel if stereo?)

For a stereo file: Say you call stream.readSamplesFloat(buf.ptr, 1024);. You will get 1024 samples for left and 1024 samples for right, interleaved (left before right LRLRLRLRLRLR... 1024 times), and the return value in that case will be 1024.

You will always get a whole number of frames, so a number of samples multiple of channels(), here stream.channels() == 2 You can get less frames than that in the return value, and it means the stream is over.

LunaTheFoxgirl commented 3 years ago

I'm considering also having support for tracker modules as bgm in my game now, would it be possible to have specific tracker seek functions that seek to a pattern + position?

An other thing I see lacking is a way to tell where in the stream you are, I don't see any position or tell functions, which would be useful.

Finally, a canSeek function would be nice so that my audio engine can adjust looping based on whether it can seek or not in the audio file.

LunaTheFoxgirl commented 3 years ago

Update: seekPosition does not work for ogg files for some reason.

LunaTheFoxgirl commented 3 years ago

As of now my engine has been switched over to audio-formats, though still need to backport all the FX stuff from my previous sound engine I implemented in an other engine. https://github.com/KitsunebiGames/kmm-engine/commit/8df5306ca4c393c04bdd3d8a504897f92162bf69

p0nce commented 3 years ago

I'm considering also having support for tracker modules as bgm in my game now, would it be possible to have specific tracker seek functions that seek to a pattern + position?

Yes, I guess it's probably what would need to be done for tracker format. I'm not fully understanding the decoders right now.

An other thing I see lacking is a way to tell where in the stream you are, I don't see any position or tell functions, which would be useful.

Indeed, now that seekPosition exists it would be convenient. => #18

a canSeek function would be nice

Updated => #6

Update: seekPosition does not work for ogg files for some reason.

Using v1.3.7? Interested in the .ogg to repro that. I did test that once but of course there could be a mistake, or perhaps mono vs stereo thing.

LunaTheFoxgirl commented 3 years ago

I manged to get ogg working, there was some problems passing stuff to OpenAL since it normally expects stuff to be passed as 8 bit samples, had to enable an extension to allow FLOAT32 as well as do some extra calculation on buffering. Doing so fixed OGG playback. Not sure why MP3 worked with that bug there, but oh well.

In terms of seeking and playing tracker modules the main functions I'd need for it to be 100% usable for what I want to do would be:

int tellModulePattern(); // Returns which pattern is currently being played
int tellModulePosition(); // Returns the current position within the current playing pattern
int countModulePatterns(); // Returns amount of patterns that there are in a module
int lengthModulePattern(int pattern); // How many indices a pattern has (patterns have different lengths)
long samplesRemainingInPattern(); // How many samples that are remaining to be decoded in the current playing pattern. (needed for seeking accurately or looping specific sections of a module)
void seek(int pattern, int position); // Seek to a specific pattern + position

If I get time I may look in to adding these functions and making a PR.