Cyan4973 / xxHash

Extremely fast non-cryptographic hash algorithm
http://www.xxhash.com/
Other
8.93k stars 770 forks source link

Could you make a tiny version of the hash function? #956

Open mofosyne opened 2 months ago

mofosyne commented 2 months ago

Basically according to author of SMHasher he said that https://github.com/tidwall/th64 version is similar to hashes like yours but compressed to a few lines (Source: https://news.ycombinator.com/item?id=40877460) .

I see nothing wrong with it if it passes SMHasher3, but it's also not radically different than other hashes - it's just smooshed down into four lines.

Got me thinking, how compact could you compress your hash function? Maybe you could place it in your readme for those who want to just copy and paste the hash function with minimum dependencies and don't care about speed.

And if so, how would it compare against th64

Cyan4973 commented 2 months ago

Shorter versions of xxh32 and xxh64 exist here : https://create.stephan-brumme.com/xxhash/ xxh3 is larger, partly because it has multiple vector implementations.

It's a long term objective to separate xxhash.h into multiple smaller sub-modules, in order to allow implementers to pick and choose just the parts they want. It will require time, which is in short supply.

I haven't heard about th64 before. It looks similar to murmurhash3, mildly obfuscated by the use of very compact coding convention.

mofosyne commented 1 month ago

Does it have to be a fast implementation? I think the idea of small copy/paste functions is that it's suitable for those who are simply aiming to get something going with minimum dependencies rather than something that is fast. So that could include small utility functions that run occasionally to embedded systems.

That being said, you could just ask the author of https://create.stephan-brumme.com/xxhash/ if he be willing to have his implementation be placed in your readme?

I think for now, just xxh32 and xx64 would be ideal as a minimum as that's seems to be whats most used out there, but you could always wrap it in <details><summary></summary></details> tag which is supported in github if you are worried about how large it would take in your readme. (But you may want to expose at least xxhash64 as a nice 'see how simple it is' to the public. Much like how the th64 author did for his.