KillingSpark / zstd-rs

zstd-decoder in pure rust
MIT License
253 stars 34 forks source link

Support for skippable frames #33

Closed KillingSpark closed 1 year ago

KillingSpark commented 1 year ago

This implementation does not currently support skippable frames which are required for a spec compliant implementation

https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#skippable-frames

This relates to the seekable format

Thanks to @phord to bringing this to my attention in #31

KillingSpark commented 1 year ago

@phord Do you by chance have any "real" files with skippable frames? I tried implementing this in #37

It should work but I don't see how I can convince zstd to generate a file with a skippable frame to test it. But tbh it isn't that complicated I am reasonably sure I got that right...

phord commented 1 year ago

I found out that pzstd (no longer being developed) produces skippable frames.

git/opc/zstd-rs(skipable_frames)» pzstd LICENSE 
LICENSE              : 64.74%   (  1075 =>    696 bytes, LICENSE.zst)          

git/opc/zstd-rs(skipable_frames)» cmp LICENSE <(zstd -dc LICENSE.zst)

git/opc/zstd-rs(skipable_frames)» md5sum LICENSE
13df02d6656dafde4ac24f5a0b25f86d  LICENSE

git/opc/zstd-rs(skipable_frames)» zstdcat LICENSE.zst | md5sum
13df02d6656dafde4ac24f5a0b25f86d  -

git/opc/zstd-rs(skipable_frames)» target/release/zstd -d -c LICENSE.zst | md5sum 
File: LICENSE.zst
Found a skippable frame with magic number: 407710288 and size: 4
100 % done
Decoded frames: 1  bytes: 1075
1 of 1 checksums are ok!
13df02d6656dafde4ac24f5a0b25f86d  -

TEST PASSED :+1:

phord commented 1 year ago

Here's the error the original code produces when skippable frames are encountered:

git/opc/zstd-rs(master)» pzstd foo    
foo                  : 10.26%   (2043388670 => 209694227 bytes, foo.zst)       

git/opc/zstd-rs(master)» ls -lah foo*
-rw-rw-r--  1 phord phord 2.0G Mar  7 12:51 foo
-rw-rw-r--  1 phord phord 200M Mar 15 11:03 foo.zst

git/opc/zstd-rs(master)» target/release/zstd -d -c foo.zst |wc -l           
File: foo.zst
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: FrameCheckError(WrongMagicNum { got: 407710288 })', src/bin/zstd.rs:60:37
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Also, it turns out that pzstd actually inserts lots of skippable frames on large files.

git/opc/zstd-rs(skipable_frames)» target/release/zstd -d -c foo.zst > /dev/null
File: foo.zst
Found a skippable frame with magic number: 407710288 and size: 4
0 % doneFound a skippable frame with magic number: 407710288 and size: 4
Found a skippable frame with magic number: 407710288 and size: 4
1 % doneFound a skippable frame with magic number: 407710288 and size: 4
Found a skippable frame with magic number: 407710288 and size: 4
2 % doneFound a skippable frame with magic number: 407710288 and size: 4
Found a skippable frame with magic number: 407710288 and size: 4
Found a skippable frame with magic number: 407710288 and size: 4
3 % doneFound a skippable frame with magic number: 407710288 and size: 4
  :
98 % doneFound a skippable frame with magic number: 407710288 and size: 4
Found a skippable frame with magic number: 407710288 and size: 4
Found a skippable frame with magic number: 407710288 and size: 4
99 % doneFound a skippable frame with magic number: 407710288 and size: 4
Found a skippable frame with magic number: 407710288 and size: 4
100 % done
Decoded frames: 244  bytes: 2043388670
244 of 244 checksums are ok!
13239317

git/opc/zstd-rs(skipable_frames)» target/release/zstd -d -c foo.zst 2>&1 | grep -c "skippable frame with magic number"               
244
KillingSpark commented 1 year ago

Nice! Thanks a lot for double checking, much appreciated.