Add example usage with WebCodecs AudioDecoder or VideoDecoder?

JohnWeisz commented 2 years ago

Hi, thanks for your work on this great library, it's working perfect for parsing audio files.

Do you have an example about how to use it with the WebCodecs API to decode frames? My use case is decoding an audio file into raw LPCM samples. In particular, what's not clear to me:

How do duration and totalDuration of a CodecFrame correlate to the timestamp and duration of an EncodedAudioChunk?
How to determine if a CodecFrame is a keyframe?
Is it the CodecFrame.data array that you pass as the data to EncodedAudioChunk? Do you pass it as-is or by copying only the portion the Uint8Array view references?

Thanks in advance, I'll add a reply if I figure it out myself in the meantime.

JohnWeisz commented 2 years ago

I figured it out in the meantime, it's actually very simple and straightforward all things considered, but there are quite a few gotchas:

EncodedAudioChunk expects microseconds, so durations and timestamps reported by CodecParser need to be amped by 1000
CodecFrame.data seems to use a "shared" ArrayBuffer across all frames/chunks, so the slice referenced by the Uint8Array must be cloned, and the clone passed to EncodedAudioChunk
It seems, at least for mpeg, that timestamp can be either 0, or the totalDuration reported by the CodecFrame, because both seem to work

In a nutshell, it's something like:

for (let frame of parser.parseAll(file)) {
    let frameBufferCopy = new ArrayBuffer(frame.data.length);
    new Uint8Array(frameBufferCopy).set(frame.data);

    let audioChunk = new EncodedAudioChunk({
        data: frameBufferCopy,
        timestamp: 0, // or can be frame.totalDuration
        type: "key", // didn't figure this out, but making each chunk a keyframe seems to cause no issues
        duration: frame.duration * 1000 // milliseconds to microseconds
    });

    // queue 'audioChunk' for decoding...
}

eshaz commented 2 years ago

Thanks for posting this, it's a good example. I'll make an update to the docs explaining the shared aspect of the data property.

One note on keyframes,

All of the codecs supported in this library aac, mpeg, opus, vorbis, and flac don't use keyframes. Based on my understanding of keyframes, it doesn't apply to audio compression at all. It's a structure that's only applied to video compression. It's interesting that the WebCodecs API requires this for audio data. Maybe it's used somewhere else to sync audio and video data together?

JohnWeisz commented 2 years ago

@eshaz Might be a simple API limitation in this case. I found that:

marking the first EncodedAudioChunk as a "keyframe", and
retrying decoding as a "keyframe" (once) in case of an error

... together get decoding to work great. Performance is pretty good as well, currently in my case it's very slightly behind AudioContext.decodeAudioData, but there is probably a lot of room for optimization to match or even surpass it.

eshaz / codec-parser

Add example usage with WebCodecs AudioDecoder or VideoDecoder? #22