DoumanAsh / xxhash-rust

Rust implementation of xxhash
Boost Software License 1.0
205 stars 20 forks source link

`Xxh3` won't pass test with input size >=1025 #5

Closed Lompik closed 3 years ago

Lompik commented 3 years ago

Setting num >= 1025 in assert_const_xxh3 assert_xxh3 fails.

The same test fails if num=512 but that can be fixed by changing (not sure if it's the right thing to do) < to <= at https://github.com/DoumanAsh/xxhash-rust/blob/07f619b49ed6dfa50c518db2c628c7b4a3dd26d7/src/xxh3.rs#L583

DoumanAsh commented 3 years ago

Your suggestion is indeed correct because we have to consume input by chunks with size of INTERNAL_BUFFER_SIZE except for last block of INTERNAL_BUFFER_SIZE I made mistake unintentionally because I made some adjustments to Rust code comparing to reference impl.

I'll have to take a look at 1025 case

DoumanAsh commented 3 years ago

Btw I didn't see a problem with assert_const_xxh3 The problem with 1025 exists only for stateful interface

Lompik commented 3 years ago

You're right it's in assert_xxh3, not in the const variant. In my tests, only with the stateful interface Xxh3 I get wrong hash. Whereas direct xxh3_64 gets me the same result as C's xxhash.

DoumanAsh commented 3 years ago

Yep, I'll have to figure it out. This is quite weird one because 1025 generally is the same as 257 for example, hence it is quite mysterious why it fails

Lompik commented 3 years ago

I forgot to mention that the current Xxh3 seems to be doing less computation as it is much faster than xxh3_64 & C's xxhash: On a 3GB file I expected 4.5s but Xxh3 turned up in 1.7s (update function ate all the 3GB).

DoumanAsh commented 3 years ago

Hmmm, I think last time I was doing benchmarks it was generally slower. But then again I was usually using data below 1kb but with a lot of iterations

DoumanAsh commented 3 years ago

In any even at first glance algorithm is identical to C implementation with some tweaks on Rust part. I'll need more time to find out what could go wrong. The only thing that can matter is number of stripes that gets accumulated over time as you feed output, but anything related to it seems to be correct so I'm not sure right now what else could be wrong.

For now tests are failing and I'll continue to review algorithm of stateful impl, but if you have knowledge of xxh3 yourself I would appreciate if you'd be able to take a look too

DoumanAsh commented 3 years ago

Thanks @Lompik I'll publish 0.8.2 and yank 0.8.1