demerphq / smhasher

75 stars 12 forks source link

Wrong number of iterations for Avalanche test #7

Closed erthink closed 6 years ago

erthink commented 6 years ago
 const int reps = 32000000 / info->hashbits;

This bug is here, added by https://github.com/demerphq/smhasher/commit/4a8840886cb839567819a5695537c50705d9555a#diff-118fcbaaba162ba17933c7893247df3aR561.

Let we have a super-wide hash that produces 2^20 bits, how many repeats we need to test it?

So, it is wrong for wide hashes, and the reason for md5 test failure. In the other words: we need enough repetitions for avoid statistical skew.

erthink commented 6 years ago

@rurban, FYI

demerphq commented 6 years ago

On 16 January 2018 at 15:50, Leonid Yuriev notifications@github.com wrote:

const int reps = 32000000 / info->hashbits;

This bug is here https://github.com/demerphq/smhasher/blob/d104d96f83fce55931ee710979c5892e9484103f/RunTests.cpp#L151, added by 4a88408#diff-118fcbaaba162ba17933c7893247df3aR561 https://github.com/demerphq/smhasher/commit/4a8840886cb839567819a5695537c50705d9555a#diff-118fcbaaba162ba17933c7893247df3aR561 .

Let we have a super-wide hash that produces 2^20 bits, how many repeats we need to test it?

I think you missed part of the patch where I added highwayhash:

https://github.com/demerphq/smhasher/commit/079a2adba28ee532bc6b5f9a2e73d3631d9ba001#diff-86fda68adead2e3d54aa4fb1a3768df4

Note that this sets the minimum number of tests to 200000. You can also see this in the output:

Avalanche Tests ### - seed-bits: 256 hash-bits: 256

Samples 200000, expected error 0.00128000, confidence level 99.99994267%

So, it is wrong for wide hashes, and the reason for md5 test failure.

In the other words: we need enough repetitions for avoid statistical skew.

No, I don't think this is correct because of the above.

The part of the test you are failing is the test to see if you are producing too many bits that are too far away from the expected.

The probability that after 200,000 coint flips you end up with a count of heads that is more than 2000 away from the expected 100,000 is extremely small.

The probability that you have MANY such bits is extremely small.

The correct test for a situation like this is to use a KS test of uniformity of the p-values of the individual bit tests, and given the data I see right now, I am extremely doubtful that you will pass.

IMO based on the data I see right now, highwayhash256 produces too many extreme counts. I will follow up with more detailed data and numbers and results from applying the KS test.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

erthink commented 6 years ago
ok 1 - Found Hash # sha1_32a
###################################################################
### Testing sha1_32a - no seed_state
### - SHA1, 32 bit seed, returning first 32 bits -
### seedbits: 32 statebits: 32 hashbits: 32
###################################################################
ok 2 - Verification code # sha1_32a
# sha1_32a             - Verification value 0xED4CCEA0 : Passed.
### Avalanche Tests ### - seed-bits: 32 hash-bits: 32
# Samples 200000, expected error 0.00128000, confidence level 99.99994267%
# Testing   0-bit keys.......... ok.     # worst-bit:   0.992% error-ratio: 1.075701e+00
ok 3 - Strict Avalanche Criteria - 0 bit/0 byte keys # sha1_32a
# Testing   8-bit keys.......... not ok! # worst-bit:   1.033% error-ratio: 1.024398e+00
#              +---------------------------------------------------+
#              |012345678901234567890123456789012345678901234567890|
#              +---------------------------------------------------+
# Scale:       |.1234567890abcdefghijklmnopqrstuvwxyzãäåêëîïðñôõöûü|
#              |üÿABCDEFGHIJKLMNOPQRSTUVWXYZÂÃÄÅÊËÑÔÕÖÛÜÝø¤*©®¶&%@#|
#              +---------------------------------------------------+
#              |pct diff from 50%: abs((0.5-(changed/reps))*2) *100|
#              +--------------------------------+
#              |0         1         2         3 |
#              |01234567890123456789012345678901|
#              +--------------------------------+
# seed     0.0 |................................|
# seed     1.0 |.......1........................|
# seed     2.0 |................................|
# seed     3.0 |.........................1......|
# seed     4.0 |................................|
# seed     5.0 |..................1.............|
# seed     6.0 |......................1.........|
# seed     7.0 |................................|
# seed     8.0 |................1.........1.....|
# seed     9.0 |..............................1.|
# seed    10.0 |............................1...|
# seed    11.0 |................................|
# seed    12.0 |............1...................|
# seed    13.0 |................................|
# seed    14.0 |..........................1....1|
# seed    15.0 |................................|
# seed    16.0 |........................1.......|
# seed    17.0 |............................1...|
# seed    18.0 |................................|
# seed    19.0 |.....1.........1................|
# seed    20.0 |................................|
# seed    21.0 |....................1...........| x 2
# seed    23.0 |........1.......................|
# seed    24.0 |........................11......|
# seed    25.0 |................................|
# seed    26.0 |................11..............|
# seed    27.0 |.....................1..........|
# seed    28.0 |.............................1..|
# seed    29.0 |................................|
# seed    30.0 |..1........1....................|
# seed    31.0 |................................|
# key      0.0 |...............................1|
# key      1.0 |................................|
# key      2.0 |............................1...|
# key      3.0 |................................| x 3
# key      6.0 |.......1........................|
# key      7.0 |................................|
#              +--------------------------------+
#              +---------------------------------------------------+
#              |012345678901234567890123456789012345678901234567890|
#              +---------------------------------------------------+
# Scale:       |.1234567890abcdefghijklmnopqrstuvwxyzãäåêëîïðñôõöûü|
#              |üÿABCDEFGHIJKLMNOPQRSTUVWXYZÂÃÄÅÊËÑÔÕÖÛÜÝø¤*©®¶&%@#|
#              +---------------------------------------------------+
#              |scaled p-value above confidence level (zero is ok) |
#              +--------------------------------+
#              |0         1         2         3 |
#              |01234567890123456789012345678901|
#              +--------------------------------+
# seed     0.0 |................................| x 32
# key      0.0 |................................| x 8
#              +--------------------------------+
# 0 of 1280 bits failed (0.00%) failed at 99.999943 confidence
#     g-test: 0.000000%
#     sum-error-square: 0.00131123
not ok 4 - Strict Avalanche Criteria - 8 bit/1 byte keys # sha1_32a

200000, are you sure ?

demerphq commented 6 years ago

On 17 January 2018 at 19:54, Leonid Yuriev notifications@github.com wrote:

ok 1 - Found Hash # sha1_32a ###################################################################

Testing sha1_32a - no seed_state

- SHA1, 32 bit seed, returning first 32 bits -

seedbits: 32 statebits: 32 hashbits: 32

################################################################### ok 2 - Verification code # sha1_32a

sha1_32a - Verification value 0xED4CCEA0 : Passed.

Avalanche Tests ### - seed-bits: 32 hash-bits: 32

Samples 200000, expected error 0.00128000, confidence level 99.99994267%

Testing 0-bit keys.......... ok. # worst-bit: 0.992% error-ratio: 1.075701e+00

ok 3 - Strict Avalanche Criteria - 0 bit/0 byte keys # sha1_32a

Testing 8-bit keys.......... not ok! # worst-bit: 1.033% error-ratio: 1.024398e+00

+---------------------------------------------------+

|012345678901234567890123456789012345678901234567890|

+---------------------------------------------------+

Scale: |.1234567890abcdefghijklmnopqrstuvwxyzãäåêëîïðñôõöûü|

|üÿABCDEFGHIJKLMNOPQRSTUVWXYZÂÃÄÅÊËÑÔÕÖÛÜÝø¤*©®¶&%@#|

+---------------------------------------------------+

|pct diff from 50%: abs((0.5-(changed/reps))2) 100|

+--------------------------------+

|0 1 2 3 |

|01234567890123456789012345678901|

+--------------------------------+

seed 0.0 |................................|

seed 1.0 |.......1........................|

seed 2.0 |................................|

seed 3.0 |.........................1......|

seed 4.0 |................................|

seed 5.0 |..................1.............|

seed 6.0 |......................1.........|

seed 7.0 |................................|

seed 8.0 |................1.........1.....|

seed 9.0 |..............................1.|

seed 10.0 |............................1...|

seed 11.0 |................................|

seed 12.0 |............1...................|

seed 13.0 |................................|

seed 14.0 |..........................1....1|

seed 15.0 |................................|

seed 16.0 |........................1.......|

seed 17.0 |............................1...|

seed 18.0 |................................|

seed 19.0 |.....1.........1................|

seed 20.0 |................................|

seed 21.0 |....................1...........| x 2

seed 23.0 |........1.......................|

seed 24.0 |........................11......|

seed 25.0 |................................|

seed 26.0 |................11..............|

seed 27.0 |.....................1..........|

seed 28.0 |.............................1..|

seed 29.0 |................................|

seed 30.0 |..1........1....................|

seed 31.0 |................................|

key 0.0 |...............................1|

key 1.0 |................................|

key 2.0 |............................1...|

key 3.0 |................................| x 3

key 6.0 |.......1........................|

key 7.0 |................................|

+--------------------------------+

+---------------------------------------------------+

|012345678901234567890123456789012345678901234567890|

+---------------------------------------------------+

Scale: |.1234567890abcdefghijklmnopqrstuvwxyzãäåêëîïðñôõöûü|

|üÿABCDEFGHIJKLMNOPQRSTUVWXYZÂÃÄÅÊËÑÔÕÖÛÜÝø¤*©®¶&%@#|

+---------------------------------------------------+

|scaled p-value above confidence level (zero is ok) |

+--------------------------------+

|0 1 2 3 |

|01234567890123456789012345678901|

+--------------------------------+

seed 0.0 |................................| x 32

key 0.0 |................................| x 8

+--------------------------------+

0 of 1280 bits failed (0.00%) failed at 99.999943 confidence

g-test: 0.000000%

sum-error-square: 0.00131123

not ok 4 - Strict Avalanche Criteria - 8 bit/1 byte keys # sha1_32a

200000, are you sure ?

The output says it ran 200k tests, so I assume it is correct.

But I think there might be a misunderstanding here again, I imagine you are looking at the sha1 results or the md5 results as a way to validate the tests, which would be a reasonable thing to do if it weren't for the fact that these aren't actually results from the actual sha1 or md5. If you look closely there is some kind of dodgy state initialization where the seed is mixed into the state via addition prior to round 0, which is not how either MD5 or SHA1 are supposed to be used.. FWIW This is something I inherited from an earlier version of smhasher. So when I look at those results I see something that fails in a way that does not surprise me.

I will crank up the number of tests, and we can see how it plays out.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq commented 6 years ago

On 17 January 2018 at 18:00, demerphq demerphq@gmail.com wrote:

On 16 January 2018 at 15:50, Leonid Yuriev notifications@github.com wrote:

const int reps = 32000000 / info->hashbits;

This bug is here, added by 4a88408#diff-118fcbaaba162ba17933c7893247df3aR561.

Let we have a super-wide hash that produces 2^20 bits, how many repeats we need to test it?

I think you missed part of the patch where I added highwayhash:

https://github.com/demerphq/smhasher/commit/079a2adba28ee532bc6b5f9a2e73d3631d9ba001#diff-86fda68adead2e3d54aa4fb1a3768df4

Note that this sets the minimum number of tests to 200000. You can also see this in the output:

Avalanche Tests ### - seed-bits: 256 hash-bits: 256

Samples 200000, expected error 0.00128000, confidence level 99.99994267%

So, it is wrong for wide hashes, and the reason for md5 test failure. In the other words: we need enough repetitions for avoid statistical skew.

No, I don't think this is correct because of the above.

The part of the test you are failing is the test to see if you are producing too many bits that are too far away from the expected.

The probability that after 200,000 coint flips you end up with a count of heads that is more than 2000 away from the expected 100,000 is extremely small.

I should have said "more 1000 away"

Just so we can be on the same page about the math here, I generated the following table:

H: 100000 g-test p-value: 1.000000000000000 binomial Pr(x >= H): 0.499107939163151 Expected From 69632 trials: 34753.884019808501762 H: 100100 g-test p-value: 0.654720819077016 binomial Pr(x >= H): 0.326553709800845 Expected From 69632 trials: 22738.587920852409297 H: 100200 g-test p-value: 0.371093210057942 binomial Pr(x >= H): 0.184949335233069 Expected From 69632 trials: 12878.392110949047492 H: 100300 g-test p-value: 0.179712168455979 binomial Pr(x >= H): 0.089494032877585 Expected From 69632 trials: 6231.648497331970248 H: 100400 g-test p-value: 0.073637885895580 binomial Pr(x >= H): 0.036639258528141 Expected From 69632 trials: 2551.264849831519314 H: 100500 g-test p-value: 0.025347013571872 binomial Pr(x >= H): 0.012600495386279 Expected From 69632 trials: 877.397694737390793 H: 100600 g-test p-value: 0.007290182594993 binomial Pr(x >= H): 0.003620802270602 Expected From 69632 trials: 252.123703706588998 H: 100700 g-test p-value: 0.001745042750164 binomial Pr(x >= H): 0.000865905796708 Expected From 69632 trials: 60.294752436399790 H: 100800 g-test p-value: 0.000346594054992 binomial Pr(x >= H): 0.000171821787692 Expected From 69632 trials: 11.964294720539757 H: 100900 g-test p-value: 0.000056987536550 binomial Pr(x >= H): 0.000028224433002 Expected From 69632 trials: 1.965323718819898 H: 101000 g-test p-value: 0.000007742866510 binomial Pr(x >= H): 0.000003831174698 Expected From 69632 trials: 0.266772356573586 H: 101100 g-test p-value: 0.000000868102792 binomial Pr(x >= H): 0.000000429124699 Expected From 69632 trials: 0.029880811069873

This shows the g-test p-value and the binomial cdf for getting each number of heads, and along with it, if we did 69632 trials, how many times we would expect to see a value this number or higher. (The number of trials corresponds to the number of cells from the SAC table for a 16 bit key, a 256 bit seed and a 256 bit hash.)

The worst bit in the 16 bit key failing avalanche table is 1.1%, which means it is something like 101100 (or 98900). From the table we can see that p-value is 0.000000868102792, meaning this result is extremely unlikely be random, and that we should see a value this extreme something like 1 in 50 times we run the test. Yet, when I look at the table I see a lot of '1's, which means that there are quite a few of these extreme values, disproportionally more than the table above suggests would be expected randomly. (If I understand things right we could validate this by checking if the p-values from the different cells in the SAC table are uniformly distributed, but right now smhasher does not perform this test.)

If you think my math is wrong on this I welcome enlightenment and explanation. :-)

cheers, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq commented 6 years ago

On 17 January 2018 at 22:11, demerphq demerphq@gmail.com wrote:

On 17 January 2018 at 18:00, demerphq demerphq@gmail.com wrote:

On 16 January 2018 at 15:50, Leonid Yuriev notifications@github.com wrote:

const int reps = 32000000 / info->hashbits;

This bug is here, added by 4a88408#diff-118fcbaaba162ba17933c7893247df3aR561.

Let we have a super-wide hash that produces 2^20 bits, how many repeats we need to test it?

I think you missed part of the patch where I added highwayhash:

https://github.com/demerphq/smhasher/commit/079a2adba28ee532bc6b5f9a2e73d3631d9ba001#diff-86fda68adead2e3d54aa4fb1a3768df4

Note that this sets the minimum number of tests to 200000. You can also see this in the output:

Avalanche Tests ### - seed-bits: 256 hash-bits: 256

Samples 200000, expected error 0.00128000, confidence level 99.99994267%

So, it is wrong for wide hashes, and the reason for md5 test failure. In the other words: we need enough repetitions for avoid statistical skew.

No, I don't think this is correct because of the above.

The part of the test you are failing is the test to see if you are producing too many bits that are too far away from the expected.

The probability that after 200,000 coint flips you end up with a count of heads that is more than 2000 away from the expected 100,000 is extremely small.

I should have said "more 1000 away"

Just so we can be on the same page about the math here, I generated the following table:

H: 100000 g-test p-value: 1.000000000000000 binomial Pr(x >= H):

That should be > H, not >= H sorry.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq commented 6 years ago

On 17 January 2018 at 22:51, demerphq demerphq@gmail.com wrote:

On 17 January 2018 at 22:11, demerphq demerphq@gmail.com wrote:

On 17 January 2018 at 18:00, demerphq demerphq@gmail.com wrote:

On 16 January 2018 at 15:50, Leonid Yuriev notifications@github.com wrote:

const int reps = 32000000 / info->hashbits;

This bug is here, added by 4a88408#diff-118fcbaaba162ba17933c7893247df3aR561.

Let we have a super-wide hash that produces 2^20 bits, how many repeats we need to test it?

I think you missed part of the patch where I added highwayhash:

https://github.com/demerphq/smhasher/commit/079a2adba28ee532bc6b5f9a2e73d3631d9ba001#diff-86fda68adead2e3d54aa4fb1a3768df4

Note that this sets the minimum number of tests to 200000. You can also see this in the output:

Avalanche Tests ### - seed-bits: 256 hash-bits: 256

Samples 200000, expected error 0.00128000, confidence level 99.99994267%

So, it is wrong for wide hashes, and the reason for md5 test failure. In the other words: we need enough repetitions for avoid statistical skew.

No, I don't think this is correct because of the above.

The part of the test you are failing is the test to see if you are producing too many bits that are too far away from the expected.

The probability that after 200,000 coint flips you end up with a count of heads that is more than 2000 away from the expected 100,000 is extremely small.

I should have said "more 1000 away"

Just so we can be on the same page about the math here, I generated the following table:

H: 100000 g-test p-value: 1.000000000000000 binomial Pr(x >= H):

That should be > H, not >= H sorry.

And having said all that, the 400k tests have now been running for long enough that I have some preliminary results, and indeed, as you expected the percentage dropped to about 0.6%, which mean the tests pass.

Once I have the KS stuff working I will let you know what I find out.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

erthink commented 6 years ago

Seem the math explanation is correct, but implementation IMHO needs clarification or/and verification (IMHO).

funny-falcon commented 6 years ago

I don't understand something:

Avalanche for each result bit is independent of other result bit. And so there are should be same number of repetitions regardless of result width. And I believe approximate for expected error gives very wrong result with low number of repetitions. All approximations are for large number of repetitions.