funny-falcon / funny_hash

Simple multiplicative hash like Murmur3 but a bit safer
21 stars 1 forks source link

smhasher test results for funny hash show failures #1

Open demerphq opened 6 years ago

demerphq commented 6 years ago

https://github.com/demerphq/smhasher/blob/master/doc/FunnyHash64-2.128.128.64.txt

FunnyHash64-2 128 128 64 FAILED 17 of 187 tests. Avalanche: 17-24, OverAll: 187, Sanity: 4-5, TwoBytes: 59, 61, 63, Zeroes: 166-167, 170

Just a heads up.

funny-falcon commented 6 years ago

Thank you for test and reminding.

I've looked inattentive before, and i've seen only avalanche one short keys. But seems like there are more. I'll investigate.

BTW, is non-zero seed used for failed tests? FunnyHash is known to be bad with all-zero seed.

demerphq commented 6 years ago

On 11 January 2018 at 20:02, Sokolov Yura notifications@github.com wrote:

Thank you for test and reminding.

I've looked inattentive before, and i've seen only avalanche one short keys. But seems like there are more. I'll investigate.

BTW, is non-zero seed used for failed tests? FunnyHash is known to be bad with all-zero seed.

Yes, it includes some zero-seed tests. But you are failing other tests too.

From a testing standpoint the hash is expected to deal with zero seeds itself.

You could for instance do something like:

seed ^= len; seed += !seed;

to ensure it is nonzero at the start of your hash.

Yves

funny-falcon commented 6 years ago

FunnyHash is intended for usage with hidden random-like seed. Zero seed is certainly neither hidden nor random-like.

May I send pull request with change of seeding for funny_hash? Like:

*(uint64_t *) out = fh64_string_hash2(key, len, s64[0] ^ 0x12345678, s64[1] ^ 0xabcdef12 );

I'll investigate issues that wouldn't be fixed with this.

demerphq commented 6 years ago

On 11 January 2018 at 21:14, demerphq demerphq@gmail.com wrote:

On 11 January 2018 at 20:02, Sokolov Yura notifications@github.com wrote:

Thank you for test and reminding.

I've looked inattentive before, and i've seen only avalanche one short keys. But seems like there are more. I'll investigate.

BTW, is non-zero seed used for failed tests? FunnyHash is known to be bad with all-zero seed.

Yes, it includes some zero-seed tests. But you are failing other tests too.

From a testing standpoint the hash is expected to deal with zero seeds itself.

You could for instance do something like:

seed ^= len; seed += !seed;

to ensure it is nonzero at the start of your hash.

Looking at the result more closely, there are three main sets of results:

Yves

demerphq commented 6 years ago

On 11 January 2018 at 21:38, Sokolov Yura notifications@github.com wrote:

FunnyHash is intended for usage with hidden random-like seed. Zero seed is certainly neither hidden nor random-like.

No, that is wrong. when choosing a number from 0 to 2**64-1, 0 is no less random than 0x0123456789ABCDEF.

Put another way, pretty much any proper RNG produces 0 exactly as often as it produces every other integer.

May I send pull request with change of seeding for funny_hash? Like:

(uint64_t ) out = fh64_string_hash2(key, len, s64[0] ^ 0x12345678, s64[1] ^ 0xabcdef12 );

How would that help? That changes your magic "bad seed" from {0,0} to {0x12345678, 0xabcdef12}.

You might pass some tests, but it would leave the core weakness in the hash.

Anyway, you can send me whatever pull requests you like, I don't mind really, its your hash. :-)

I think there is an alternative. Your hash is used similarly to Perl's, and my fork of smhasher was designed to test perls hash functions, which means it supports a distinct interface for initializing the state before doing any hashing. The function takes the seed and then returns a frozen "state" which is passed through to the hashing functiion. You can define a state initialization function and then do whatever seed guards you need there. In fact, i think right now there is already one of these functions, and you just need to educate it as you wish. I would say you library should define such state initialization code itself, so users do not have to be aware of any issues like this. This would also give you an opportunity to pre-mix your seeds, possible together, so the state is better mixed when you start hashing.

I'll investigate issues that wouldn't be fixed with this.

Sure. Really its your call. I have a personal interest in this subject, and I consider all of these tests sane minimum requirements for any hash. If you don't agree that is fine with me. :-)

FWIW, I think your hash is kinda interesting. It is really two hashes run in parallel, each of which has an easy multicollision vulnerability which is ameliorated by the fact that the input vector that breaks one does not break the other, and vice versa. It is a bit scary actually. I kinda wonder what a cryptographer would make of it.

cheers, Yves

funny-falcon commented 6 years ago

I don't care for zero seed at all. I'll put pull request for seed xoring as in comment above.

No, that is wrong. when choosing a number from 0 to 2**64-1, 0 is no less random than 0x0123456789ABCDEF.

Yes. But seed is 128bit. And all-zero seed has probability is 2^-128 . It is same probability there is another UUIDv4 matching my UUIDv4. So, I don't care.

I don't care much for avalanche for short keys. I know how to fix it: just add another one multiplication round. It will slow function for short keys. (That is why it 8 byte keys are good: there are one more multplication round compared to 7 byte). Especially I don't care much for this avalanche dependency on seed value. Even your test results shows there is always at least 70 bits of seed value that produces good avalanche, and that is more than enough for FunnyHash purposes (ie hash for internal hash table without exposed seed or hashsum value).

But I care for twobytes. I'll certainly will try to understand whats happening.

btw, can you help me to compile benchmark? I do:

$ git clone https://github.com/demerphq/smhasher.git
$ cd smhasher
$ mkdir build
$ cd build
$ cp ../../funny_hash/funny_hash.h .
$ cmake ..
$ # VERSION.h not found. What should I install?
$ make
....
/home/yura/Project/smhasher-demerphq/TAP.cpp:5:21: fatal error: VERSION.h: No such file or directory
 #include "VERSION.h"
                     ^
compilation terminated.
funny-falcon commented 6 years ago

btw: did you try https://github.com/funny-falcon/fanom_hash ? I'll check it as well and will send PR for it.

demerphq commented 6 years ago

On 11 January 2018 at 22:11, Sokolov Yura notifications@github.com wrote:

I don't care for zero seed at all. I'll put pull request for seed xoring as in comment above.

No, that is wrong. when choosing a number from 0 to 2**64-1, 0 is no less random than 0x0123456789ABCDEF.

Yes. But seed is 128bit. And all-zero seed has probability is 2^-128 . It is same probability there is another UUIDv4 matching my UUIDv4. So, I don't care.

Shrug. That is your call.

I don't care much for avalanche for short keys. I know how to fix it: just add another one multiplication round. It will slow function for short keys. (That is why it 8 byte keys are good: there are one more multplication round compared to 7 byte). Especially I don't care much for this avalanche dependency on seed value. Even your test results shows there is always at least 70 bits of seed value that produces good avalanche, and that is more than enough for FunnyHash purposes (ie hash for internal hash table without exposed seed or hashsum value).

That is your call again. I suspect that it means you leak seed state, so attacking the hash is a lot easier than you think, but I cant prove it right now.

Also, dont you have a 64 bit version?

But I care for twobytes. I'll certainly will try to understand whats

happening.

As you wish.

btw, can you help me to compile benchmark? I do:

$ git clone https://github.com/demerphq/smhasher.git $ cd smhasher $ mkdir build $ cd build $ cp ../../funny_hash/funny_hash.h . $ cmake .. $ # VERSION.h not found. What should I install? $ make .... /home/yura/Project/smhasher-demerphq/TAP.cpp:5:21: fatal error: VERSION.h: No such file or directory

include "VERSION.h"

                 ^

compilation terminated.

Run ./make_smhasher, the file will be created - sorry about that, I will add that to the docs.

cheers, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq commented 6 years ago

On 11 January 2018 at 22:14, Sokolov Yura notifications@github.com wrote:

btw: did you try https://github.com/funny-falcon/fanom_hash ? I'll check it as well and will send PR for it.

Sure. Sounds good.

cheers, Yves

demerphq commented 6 years ago

On 11 January 2018 at 22:18, demerphq demerphq@gmail.com wrote:

On 11 January 2018 at 22:11, Sokolov Yura notifications@github.com wrote:

I don't care for zero seed at all. I'll put pull request for seed xoring as in comment above.

No, that is wrong. when choosing a number from 0 to 2**64-1, 0 is no less random than 0x0123456789ABCDEF.

Yes. But seed is 128bit. And all-zero seed has probability is 2^-128 . It is same probability there is another UUIDv4 matching my UUIDv4. So, I don't care.

Shrug. That is your call.

Having said that, I don't get it. How hard is it to define a function that ensures your state has reasonable mixed version of its input state? I mean if it gets executed exactly once per seed, and you normally use the same seed per process why would you not? It just seems irresponsible, not matter how unlikely it is.

Maybe I should give some context on my view of this. Perl used to use a hash function with the same problem, and with some builds it would end up seeded as zero. Thus there was a security issue and etc. Even if it is unlikely under totally correct usage, it is still a design flaw.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

funny-falcon commented 6 years ago

That is your call again. I suspect that it means you leak seed state, so attacking the hash is a lot easier than you think, but I cant prove it right now.

FunnyHash is only for use-cases where neither seed, nor result hash-sum is exposed (in any way, except timing for insertion/lookup into hash-table) (and seed is strongly random). For any other use case there is no any promises. 70 bits of seed is more than enough to protect against timing investigations. If it were avalanche fail on keybits (not on seed), or if there were only 30bits of seed value that produces good avalanche, I'd bother more.

funny-falcon commented 6 years ago

Maybe I should give some context on my view of this. Perl used to use a hash function with the same problem, and with some builds it would end up seeded as zero. Thus there was a security issue and etc. Even if it is unlikely under totally correct usage, it is still a design flaw.

Any function with known seed is vulnurable to bruteforce searching of collisioned keys. It doesn't matter, have you SHA-1, SHA-2, SHA-3 or BLAKE-2 or SIPHASH. If you know seed, you easily generate collisioned keys.

That is why I don't bother for known seed. And zero-seed is known seed.

funny-falcon commented 6 years ago

If you know seed, you easily generate collisioned keys.

I mean, collisioned into same bucket of hash table.

funny-falcon commented 6 years ago

Look at Microsoft Marvin32 hash function. It is awesome. (and patented). But it is also dumb with zero-seed.

demerphq commented 6 years ago

On 11 January 2018 at 22:37, Sokolov Yura notifications@github.com wrote:

Look at Microsoft Marvin32 hash function. It is awesome. (and patented). But it is also dumb with zero-seed.

We have radically different definitions of awesome, Marvin32 doesn't do well in the smhasher tests either.

Anyway, good luck. Ill look out for your PR's.

cheers, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

funny-falcon commented 6 years ago

Marvin32 doesn't do well in the smhasher tests either.

Just try it with pre-xor-ed seed (so it is non-zero).

demerphq commented 6 years ago

On 11 January 2018 at 22:33, Sokolov Yura notifications@github.com wrote:

Maybe I should give some context on my view of this. Perl used to use a hash function with the same problem, and with some builds it would end up seeded as zero. Thus there was a security issue and etc. Even if it is unlikely under totally correct usage, it is still a design flaw.

Any function with known seed is vulnurable to bruteforce searching of collisioned keys.

Which for any decent hash is itself a difficult task.

It doesn't matter, have you SHA-1, SHA-2, SHA-3 or BLAKE-2 or SIPHASH.

If you know seed, you easily generate collisioned keys.

Nope. Sorry. It took Google level resources and a major research to find any collisions at all in SHA1, which by the way has a fixed seed.

SHA-2 and SHA-3 have fixed seeds. I dont know much about BLAKE-2, but i think its a SHA-3 candidate, so it would have a fixed seed as well.

Siphash is seeded, and has no zero vulnerabilities, as it uses a 128 seed to initialize a 256 bit state in way that zeros are impossible. (It is quite elegant actually.)

Anyway, listen, like I said, I don't care what you do with this hash really, it wont affect me at all. I am in this to learn, and to share what I have learned, hence the reason I have pursued this at all.

cheers, Yves

demerphq commented 6 years ago

On 11 January 2018 at 23:00, Sokolov Yura notifications@github.com wrote:

Marvin32 doesn't do well in the smhasher tests either.

Just try it with pre-xor-ed seed (so it is non-zero).

It doesn't mix its seed properly, and fails avalanche tests.

Yves

funny-falcon commented 6 years ago

SHA-2 and SHA-3 have fixed seeds. I dont know much about BLAKE-2, but i think its a SHA-3 candidate, so it would have a fixed seed as well.

Both BLAKE-2 and SHA3 are keyed hash functions, ie they have arbitrary seed.

Nope. Sorry. It took Google level resources and a major research to find any collisions at all in SHA1, which by the way has a fixed seed.

I didn't mean full hashsum collision. Only collision to same bucket of hash-table. If hash-table has 100_000 buckets, it is enough to examine 10_000*100_000 = 1_000_000_000 keys to find collision chain longer than 50_000, if seed is known. That is just a day of GPU even with newest SHA-3.

That is why it is important to generate secure random seed and keep it secret.

That is why language/db runtime designer have to be sure seed may be all-zero only in strict random case, so attacker cannot rely on it.

funny-falcon commented 6 years ago

I've tested funny_hash with xored seed (as I suggest above). It passes all tests except avalanche for short keys. I will send PR for this.

funny-falcon commented 6 years ago

Thank you for this issue.

I found way to feed 64bit seed into 128bit state so even no avalanche errors happens (at a cost of halving seed). And I accept your idea of "preparing state" function. I will add this "prepare" to funny_hash.h and will make PR to your tests for "funny_hash with 64bit seed and 128bit state".

Also I will PR with fanom hash. It is quite fast, and looks to pass tests (after little tweak I will publish soon). Though I believe its complexity makes it less attractive than funny_hash for language implementations.

funny_hash has good property of easy chaining. It has same 128 intermediate state equal to (random/expanded) seed. fanom_hash has 256bit intermediate state with 128bit initial seed, and it is less convenient to use in language implementations, where state should be passed while deep structure hashed.

funny-falcon commented 6 years ago

Damn, fanom_hash didn't pass differential "up-to-3-bit differentials in 256-bit keys -> 64 bit hashes" ... pitty...

funny-falcon commented 6 years ago

No, there was just small error. FanomHash passes tests.