anthumchris / opus-stream-decoder

Instantly decode Ogg Opus audio streams in chunks with JavaScript & WebAssembly (Wasm)
124 stars 21 forks source link

Add seeking functionality to the decoder #1

Open anthumchris opened 5 years ago

anthumchris commented 5 years ago

seeker

Investigate adding a feature like OpusStreamDecoder.seekTo(milliseconds) to start decoding at a specific time within the file. Currently, the decoder only supports decoding a file from the beginning.

Using libopusfile.op_pcm_seek() seems like a good idea. OpusStreamDecoder's underlying C implementation may need to dynamically allocate OpusChunkDecoder.buffer to hold the entire size of the file to decoded file. Then, bytes would be enqueued into buffer at certain positions for op_pcm_seek() to succeed.

Because of Opus' dynamic streaming nature, the total samples/duration of the file would either need to be known beforehand or calculated on the fly (see "how do I get duration..." and "why don't you store the duration..").

Guesswork would be required to decide which bytes would be required to start playing specific time. For example, take a 4mb (4,194,304 bytes) file with a duration of 4:00m (240,000 seconds). If we wanted to start playing at the 1:00m mark, which is 25% of the total duration, we could assume that the byte offset to start reading would be near 25% past the start of the file at 1,048,576 bytes (0.25 * 4194304). We would probably need to pad the beginning of that offset to ensure that the decoder has enough bytes to read packets for the pre-roll decoding, and assuming an Ogg Opus file's maximum page is 64k, that could potentially be enough. These are all initial guesses and ideas.

zandaqo commented 5 years ago

@AnthumChris Thanks, this is exactly what I was looking for. Under this scenario (guessing required bytes), aside from total size and duration, do we need any other data to decode/play the required chunk? For example, do we need the beginning of a file or some other metadata?

I have a use case where I want to play paragraphs/pages of audiobooks. Currently I have to create a separate file for each paragraph, but I'd rather have one file for a book and load it partially. I'm ok with padding overhead and storing extra metadata.

anthumchris commented 5 years ago

Yes, OpusStreamDecoder's underlying libopusfile library requires the beginning of the file to discover and instantiate an OggOpusFile with og_open_* functions. Specifically, the header pages and the first audio data page seem to be required, based on my tests below. Data page sizes vary based on the encoding bitrate (higher bitrates, larger pages). Searching my test file with xxd for "OggS" page boundaries, a minimum of 8,918 bytes is required for that file:

# prep the test files
make test-wasm-module

# inspect Opus file and see the end of first data page at 0x00022d6 (8918 bytes)
xxd -l 8950 -g 8 -c 14 tmp/decode-test-64kbps.opus |grep "OggS"

# create truncated file with 8,918 bytes and decode - succeeds
LEN=8918
dd if=tmp/decode-test-64kbps.opus of=tmp/truncated-$LEN.opus bs=1 count=$LEN
node dist/test-opus-stream-decoder.js tmp/truncated-$LEN.opus tmp

# create truncated file with 8,917 bytes and decode - fails
LEN=8917
dd if=tmp/decode-test-64kbps.opus of=tmp/truncated-$LEN.opus bs=1 count=$LEN
node dist/test-opus-stream-decoder.js tmp/truncated-$LEN.opus tmp
anthumchris commented 5 years ago

To clarify above, libopusfile requires the header pages and any complete audio data page. The test below skips the first page and decodes an Opus file with headers and audio page 2:

# inspect file to see page boundaries for headers and 1st, 2nd audio pages
# Header pages: 0x0-0x349 (0-841 bytes)
# Audio page 1: 0x349-0x22d6 (841-8918 bytes, 8077 total)
# Audo page 2: 0x22d6-0x4263 (8918-16995 bytes, 8077 total)
xxd -l 17010 -g 8 -c 15 tmp/decode-test-64kbps.opus | grep "OggS"

# create file with headers only
OUTFILE="tmp/truncated-headers-page2.opus"
START=0
LEN=`expr 841 - $START`
dd if=tmp/decode-test-64kbps.opus of="$OUTFILE" bs=1 skip="$START" count=$LEN

# append audio page 2 to that file
START=8918
LEN=`expr 16995 - $START`
dd if=tmp/decode-test-64kbps.opus bs=1 skip="$START" count=$LEN >> "$OUTFILE"

# decode the file with headers and page 2 (page 1 skipped)
node dist/test-opus-stream-decoder.js "$OUTFILE" tmp
zandaqo commented 5 years ago

@AnthumChris To sum up, for streaming we need the header pages and file's total length and duration. Can libopusfile isolate those header pages given a chunk of data? I can imagine streaming file by doing two range requests, the first one to get the header pages and the second one to get the audio pages needed. Better yet, we can store the header pages alongside other metadata beforehand if there is a sure way of getting them from a file.

anthumchris commented 5 years ago

@zandaqo libopusfile.op_test() could test 0-57 range chunk as a valid Opus file, but I didn't see functions in libopusfile would actually return the bytes of the parsed headers. libogg may do that or a manual byte scan for page boundaries could suffice, presumably scanning for a page who's granule position is not zero (see RFC7845 - Granule Position) for bytes 7-14

anthumchris commented 5 years ago

I decided against dynamically allocating a buffer for the entire Opus file size. It's wasteful and doesn't scale for large Opus files (e.g. a 3-hour SoundCloud DJ mix hundreds of megabytes in size). I originally considered that option because op_open_memory() offers an easier development solution.

My current idea for the high-level design is to instantiate a SeekableDecoder with a Request object and specify the time where decoding would begin. onDecode() would receive an additional value seekStart to distinguish decoded audio from other callback sequences:

const decoder = new SeekableDecoder(new Request('https://...music.opus'), {onDecode});
decoder.decodeAt(30*1000); // seek to 00:30.00 minutes and decode 

function onDecode({seekStart, ...}) {}

This design removes developers from dealing with the internals of HTTP range requests and byte parsing, and decodeAt() could be called multiple times arbitrarily whenever seeking is needed. Future performance enhancements could internally cache fetched bytes and optimize background fetching.

zandaqo commented 5 years ago

@AnthumChris That sounds great! This would indeed simplify my case immensely.

I assume under the hood it will have to make at least two range requests for decodeAt: one for the headers and one for the audio. In that case, I think it would be prudent to allow an extra optional argument containing headers so that one can save an extra request by supplying headers along with other metadata.

anthumchris commented 5 years ago

@zandaqo 2 initialization range requests: fetch header pages, fetch last page (to calculate duration). File size would be implicitly obtained via response headers. Your idea is a good one and those 3 things could be provided as initialization arguments.

anthumchris commented 5 years ago

This feature is currently being developed in branch http-opus-seek and test-seekable-opus-stream-decoder.html is a successful test file.

The duration seeking (finding last OggS page byte sequence from end of file) is the most inefficient operation, especially for larger files (I tested with an unrealistic 512 kbit/s file with a last page size of 47,896 bytes). Seems best if the duration value were provided during init to avoid duration seeking. A server-side process that reads the file backwards and sends the value in a header could be cool. Even cooler if an Nginx module existed to calculate things on the fly and return needed values in HTTP response headers.

Aug 2020 Update: This is no longer being developed by me. Do not currently have the needed time to devote towards it.

bvibber commented 5 years ago

Check out the older oggz-tools package; specifically oggz-chop and its friends which should include support for producing X-Content-Duration headers and doing server-side seek on static files when given a time offset.

Probably not modern, so you may wish to replace them with something that fits into nginx more nicely, but should give you a head start!

bvibber commented 5 years ago

(I'm also not 100% sure if those tools support Opus, but adapting should not be hard if not.)

turbo commented 3 years ago

@AnthumChris I want to implement this on a site that streams opus files. All the files have the same sample rate and frame size, and for each file, I have the duration info in the form of time (4 mins 46.82 secs) and samples (13767583).

Would that simplify the expensive duration seeking you described? If so, how?