eshaz / codec-parser

Browser and NodeJS library that parses audio data into frames containing frame data, header values, duration, and other information.
GNU Lesser General Public License v3.0
25 stars 4 forks source link

Add example usage with WebCodecs AudioDecoder or VideoDecoder? #22

Open JohnWeisz opened 2 years ago

JohnWeisz commented 2 years ago

Hi, thanks for your work on this great library, it's working perfect for parsing audio files.

Do you have an example about how to use it with the WebCodecs API to decode frames? My use case is decoding an audio file into raw LPCM samples. In particular, what's not clear to me:

Thanks in advance, I'll add a reply if I figure it out myself in the meantime.

JohnWeisz commented 2 years ago

I figured it out in the meantime, it's actually very simple and straightforward all things considered, but there are quite a few gotchas:

In a nutshell, it's something like:

for (let frame of parser.parseAll(file)) {
    let frameBufferCopy = new ArrayBuffer(frame.data.length);
    new Uint8Array(frameBufferCopy).set(frame.data);

    let audioChunk = new EncodedAudioChunk({
        data: frameBufferCopy,
        timestamp: 0, // or can be frame.totalDuration
        type: "key", // didn't figure this out, but making each chunk a keyframe seems to cause no issues
        duration: frame.duration * 1000 // milliseconds to microseconds
    });

    // queue 'audioChunk' for decoding...
}
eshaz commented 2 years ago

Thanks for posting this, it's a good example. I'll make an update to the docs explaining the shared aspect of the data property.

One note on keyframes,

All of the codecs supported in this library aac, mpeg, opus, vorbis, and flac don't use keyframes. Based on my understanding of keyframes, it doesn't apply to audio compression at all. It's a structure that's only applied to video compression. It's interesting that the WebCodecs API requires this for audio data. Maybe it's used somewhere else to sync audio and video data together?

JohnWeisz commented 2 years ago

@eshaz Might be a simple API limitation in this case. I found that:

  1. marking the first EncodedAudioChunk as a "keyframe", and
  2. retrying decoding as a "keyframe" (once) in case of an error

... together get decoding to work great. Performance is pretty good as well, currently in my case it's very slightly behind AudioContext.decodeAudioData, but there is probably a lot of room for optimization to match or even surpass it.