JuliaIO / CodecZstd.jl

A zstd codec for TranscodingStreams.jl.
Other
22 stars 14 forks source link

random access/seek support #18

Open jrevels opened 4 years ago

jrevels commented 4 years ago

ref https://github.com/facebook/zstd/issues/395#issuecomment-492741194 ref https://discourse.julialang.org/t/ann-onda-jl-a-format-for-multi-sensor-multi-channel-lpcm-encodable-recordings/32650/3

I'm not sure if the upstream zstd's seek support is considered stable enough to support here, or what the most desirable interface would be from a TranscodingStreams perspective, but it would definitely useful if this feature was useable with CodecZstd.

Even if the upstream seekable format is still considered experimental, it may still be worthwhile to support here; the referenced thread makes it seem like more downstream usage of the experimental API would motivate further development/codification of the seekable format.

mkitti commented 5 months ago

My current thinking here is the level to implement seekability on frames rather than blocks.

Through a combination of ZSTD_findDecompressedSize and ZSTD_getFrameContentSize we can get the total decompressed size and the decompressed size of individual frames, if saved such as via our ZstdFrameCompressor.

To seek, we iterate through frames. For each frame

  1. Check if the frame is skippable or not via ZSTD_isSkippableFrame
  2. Get the decompressed size via ZSTD_getFrameContentSize
  3. Get the size of the frame ZSTD_findFrameCompressedSize
  4. If the data we seek is not within the decompressed size, continue to the next frame by advancing by the frame compressed size

To better support random access, especially remotely, we could encode a directory into a skippable frame at the beginning or end of the stream.