Closed firasuke closed 2 years ago
@firasuke If file is small you can use one-shot function (alternatively you memmap to load file?) There is no streaming hasher for 128bit variant.
64bit variant is a bit faster and provides just a bit of less guarantees on collisions than 128bit variant.
And Rust's Hasher
trait is not suitable to provide 128bit hash too, so I didn't really bother with streaming 128bit hasher.
I'm not sure what sort of example you need though. Hashing functions are pretty straight-forward to use
@DoumanAsh thanks for the quick reply!
There is no streaming hasher for 128bit variant.
I see.
64bit variant is a bit faster and provides just a bit of less guarantees on collisions than 128bit variant.
Interesting.
And Rust's Hasher trait is not suitable to provide 128bit hash too, so I didn't really bother with streaming 128bit hasher.
So is the Rust implementation of XXH3
not as performant as its C variant.
Hashing functions are pretty straight-forward to use
Please bare with me here, as I'm still a beginner. I figured out how to use the Xxh3
struct, using new()
then having it update()
and return the digest()
, but I can't figure out how to use the "one-shot" versions.
I also don't know how big of a buffer I should provide, and I tried using std::io::copy
but the copy trait isn't implemented for the Xxh3
hasher.
So is the Rust implementation of XXH3 not as performant as its C variant.
XXH3
is 64bit streaming variant. It is slower than one-shot function, but it is close to actual C implementation of streaming algorithm.
I also don't know how big of a buffer I should provide, and I tried using std::io::copy but the copy trait isn't implemented for the Xxh3 hasher.
For one-shot function the usage basically is following:
let digest_128bit = xxhash_rust::xxh3::xxh3_128(data_buffer);
Simply put, you have to provide all data as input.
Please bare with me here, as I'm still a beginner. I figured out how to use the Xxh3 struct, using new() then having it update() and return the digest(), but I can't figure out how to use the "one-shot" versions.
You can perform as many update()
as you want until you hit digest()
.
After digest()
computed hash, you need to do reset
to start anew.
P.s. just to clarify in case you're beginning to programming in general. 64bit is faster than 128bit variant purely because 64bit computations are cheaper than 128bit on current hardware.
Thanks for the much appreciated reply!
Closing as you've answered my question.
Re-opened because I'm not getting identical hashes compared to xxh128sum
.
let mut file = std::fs::File::open("zstd-1.5.1.tar.zst")?;
let mut reader = std::io::BufReader::new(file);
let mut buffer = [0; 1024];
loop {
let count = reader.read(&mut buffer)?;
if count == 0 {
break;
}
}
let digest_128bit = xxhash_rust::xxh3::xxh3_128(&buffer);
println!("{}", digest_128bit);
I'm getting 9424528267328982158832983987192822081
, when xxh128sum
is giving me ffb2910f0fef0b9989b4d9f993acb3cf
.
What could I be doing wrong?
@firasuke Are you sure you 1024 bytes is enough for whole file? I have tests that verify correctness against C implementation up to 2048 bytes so it is rather strange. There was recently xxhash C 0.8.1 release, but I don't think algorithm changed itself
Try to use https://doc.rust-lang.org/std/io/trait.Read.html#method.read_to_end
with Vec, to read the whole file at once and pass it to the function
~Also please note that xxhsum
returns 64bit hash, even if you set 128bit algorithm~ (I'm not sure what is xxh128sum
though, I thought there was single xxhsum
)
Also please be sure to use hex print format: {:x}
- you cannot compare decimal with hex output of xxh128sum
From what I gather to use 128bit variant you have to just run xxhsum -H128 ...
Alright here are the hashes from xxhsum
(apparently xxh32sum
is equivalent to xxhsum -H0
, xxh64sum
is equivalent to xxhsum -H1
, xxh128sum
is equivalent to xxhsum -H2
, and to use XXH3
it's xxhsum -H3
according to this):
xxh64sum (xxhsum -H1): ae6aaa491779e499
xxh128sum (xxhsum -H2): ffb2910f0fef0b9989b4d9f993acb3cf
xxhsum -H3: 89b4d9f993acb3cf
Here's what I'm using:
Cargo.toml
xxhash-rust = { version = "0.8.2", features = [ "xxh3" ] }
main.rs
let mut file = std::fs::File::open("zstd-1.5.1.tar.zst")?;
let mut buffer = Vec::new();
file.read_to_end(&mut buffer)?;
let digest_128bit = xxhash_rust::xxh3::xxh3_128(&buffer);
println!("{:x}", digest_128bit);
I'm getting 99aa06d3014798d86001c324468d497f
which appears to be way off than what xxhsum
is showing up, any ideas?
Ok so I made sample program myself and run it side by side with xxhsum
22:19 $ .\target\debug\xxhash-rust.exe
digest b1ebd323dc1ee5eda4dff98a5944ffc4: C:\Users\Douman\Downloads\CV.pdf
22:19 $ C:\Users\Douman\Downloads\xxhsum.exe -H128 C:\Users\Douman\Downloads\CV.pdf
b1ebd323dc1ee5eda4dff98a5944ffc4 C:\Users\Douman\Downloads\CV.pdf
My sample program:
fn main() {
const FILE: &str = "C:\\Users\\Douman\\Downloads\\CV.pdf";
use std::io::Read;
let mut file = std::fs::File::open(FILE).expect("open file");
let mut buffer = Vec::new();
file.read_to_end(&mut buffer).expect("read file");
let digest_128bit = xxhash_rust::xxh3::xxh3_128(&buffer);
println!("digest {:x}: {FILE}", digest_128bit);
}
@firasuke H3
is not 128bit variant I believe, not enough bits in hash.
So I think the correct option is -H2
or -H128
I honestly not sure what is what, CLI's help is not helpful.
https://github.com/Cyan4973/xxHash/blob/dev/cli/xxhsum.c#L1054-L1055
Should be 2 or 128
But it is strange that your file outputs something different comparing to xxhsum, I guess if you can share file with me I'll check it out?
I don't know what is wrong in your invokation of xxhsum, and I'm also not sure how -H
works exactly
Can you check xxhsum --version
?
It should be 0.8.0 or 0.8.1, maybe your distro has old shit?
Here's the file.
It should be 0.8.0 or 0.8.1, maybe your distro has old shit?
Apparently not:
xxhsum 0.8.1 by Yann Collet
compiled as 64-bit x86_64 + SSE2 little endian with GCC 11.1.0
Checked file:
22:25 $ C:\Users\Douman\Downloads\xxhsum.exe -H2 C:\Users\Douman\Downloads\zstd-1.5.1.tar.zst
ffb2910f0fef0b9989b4d9f993acb3cf C:\Users\Douman\Downloads\zstd-1.5.1.tar.zst
22:26 $ .\target\debug\xxhash-rust.exe
digest ffb2910f0fef0b9989b4d9f993acb3cf: C:\Users\Douman\Downloads\zstd-1.5.1.tar.zst
it is correct?
Yup it is correct, and your code above worked fine. Wonder why my implementation was wrong?
I'm honestly not sure? code is the same so it seems mysterious to me, not sure where is mistake
Closing, thanks for the help and sorry for the inconvenience caused.
Can you provide a working example on how to produce the
xxh128sum
of a file usingxxhash-rust
?