ethereum / py-snappy

A pure python implementation of the Snappy compression algorithm.
MIT License
16 stars 9 forks source link

incorrect comparison #18

Open robey opened 4 years ago

robey commented 4 years ago

https://github.com/ethereum/py-snappy/blob/f91082fa1498ed919ef26bd853a4552675c9665a/py_snappy/main.py#L235

your use of cute "C244" constants tripped you up! the comparisons should be to 1 << 8, 1 << 16, and 1 << 24

pipermerriam commented 4 years ago

Was this just from you looking at the code or did you encounter this via an actual library error or something that got encoded/decoded incorrectly. If it's the later I'd love to get a test vector that would expose this issue if you've got something available.

robey commented 4 years ago

just from inspection. i was using your code & the spec to do some testing on whether snappy would work on a side project if i used very small lookback sizes.

i posted my code here: https://gist.github.com/robey/0b748439a5fa76a9c763bd9f04b15c35

google's snappy lib can decode anything i throw at it, so i think it works. feel free to copy anything from that gist that you want -- i based it heavily on your code. :)

pipermerriam commented 4 years ago

I've been trying to hunt for an example that demonstrates this. Currently I'm doing it blindly by pulling random slices of the official test vectors.

See: https://github.com/ethereum/py-snappy/pull/19

So far I've not found a failing example...

I also did a quick sanity to be sure the code path is being hit. check inserting an assert False before this line:

https://github.com/ethereum/py-snappy/blob/f91082fa1498ed919ef26bd853a4552675c9665a/py_snappy/main.py#L236

It does indeed get executed, but so far everything I compress with this library is decompressable with libsnappy, as well as the inverse.

robey commented 4 years ago

It should decompress correctly.

What you'll see is that any literal with a 3-byte length (248, L, M, H) has an H byte of 0, meaning it should have been encoded with a 2-byte length (244, L, H).

Stepping back a bit: Literals of length >= 60 are encoded with a special marker byte, followed by 1 to 4 bytes encoding the length. If the "length of the length" is N, then the marker byte is ((59 + N) << 2). So the comparisons should be 256, 64KB, and 24MB.