erthink / t1ha

One of the fastest hash functions
https://www.ptsecurity.com
Other
341 stars 29 forks source link

Comparison with Google's HighwayHash #5

Closed erthink closed 6 years ago

erthink commented 7 years ago

Comparison with https://github.com/google/highwayhash

@jan-wassenberg, this may be interesting for you.

Bulat-Ziganshin commented 7 years ago

HWH authors believe that their hash is unlike other ones ))) just read their paper, especially the Analysis part

erthink commented 7 years ago

@Bulat-Ziganshin, я отвечу по-русски. С моей англоязычной дизлексией мне так удобней.

Дочитал их бумаженцию. Нужно будет еще кое-что додумать, но скажем так - раздел "Security Analysis" не выглядит безупречным.

Тем не менее, работа сделана, изложена точка зрения с некими аргументами. Соответственно, можно от чего-то оттолкнутся.

jan-wassenberg commented 7 years ago

@Bulat-Ziganshin, we've updated the paper (v3) to include your suggestion of using median, and added RDRAND measurements for comparison. Does that address your concerns?

@leo-yuriev - looks very positive :), good mixing from multiplication and very fast even if SIMD is not available. Personally I am interested in increasing security because hashes are already quite fast, and it's probably more expensive than the CPU time saved if they encounter a problem (bias, collisions etc). Would be interesting to see an attempt to show that differential attacks (e.g. http://crypto.stackexchange.com/questions/6408/from-hash-to-cryptographic-hash) are not possible.

Bulat-Ziganshin commented 7 years ago

пустышка это, а не анализ. сунь туда любой другой хеш, хоть твой - рез-ты будут ровно такими же. а уж то что они взяди самый слабый из тестов smhasher и решили что если его пропустить 900 миллионов раз на гугловских кластерах, то это станет супертестом - многое говорит об их математическом уровне. извини, это просто два неуча, прикрывающих именем гугла своё невежество. хотя к программистской части у меня претензий нет - конструкция интересная, реализация отличная. т.е. вполне конкурент остальным несекьюрным хешам, но ни на копейку больше

erthink commented 7 years ago

@Bulat-Ziganshin, ну вы батенька с плеча правду-матку ) Но в том числе и поэтому хочется увидеть всё в одном SMHasher, а потом "потыкать палочкой".

Bulat-Ziganshin commented 7 years ago

@jan-wassenberg,

  1. I don't yet seen v3 of your paper, but i'm sure that with any changes you can't find anything new using this test - a lot of hashes from sha to spooky has ideal properties in ALL smhasher tests, and this can't change with using 900M iterations instead of 300K. The only really new thing you can do is to make real math proof of that fact (i made this conclusion only informally). So, if you are interested in making some math analysis of smhasher stats of sha/hwh/spooky - i can describe overall idea. Using math stats (not simple avg!) of multiple runs, indeed, is a first step in this plan

  2. The rest of your "security analysis" is, again, applicable to any wide hash, spooky f.e. This is a weak part of SipHash paper, so i propose to inverse your original conclusion - instead of talking that HWH is secure hash, check several hashes and make conclusion that SipHash doesn't have edge here, and Bernstein just cheated, using results of a few really weak hashes as evidence that all fast hashes are weak.

Bulat-Ziganshin commented 7 years ago

Но в том числе и поэтому хочется увидеть всё в одном SMHasher, а потом "потыкать палочкой".

у меня есть тесты spooky32, моего собственного zzh32 и т.д. потыкай в них - если увидишь хоть какое-то отличие от sha1-32 - напиши. я на 100% уверен что и hwh достигнет таких же результатов. а если ты не видишь что там изучать, то опять же - зачем тебе hwh тогда? хотя вообще что они интеграция с smh не опубликовали - нехорошо. с другой стороны, это и работы на час.

Bulat-Ziganshin commented 7 years ago

Personally I am interested in increasing security

can you please show security problems of spooky hash? if you don't know ANY, then what exactly you are plan to increase? :D

Bulat-Ziganshin commented 7 years ago

Would be interesting to see an attempt to show that differential attacks (e.g. http://crypto.stackexchange.com/questions/6408/from-hash-to-cryptographic-hash) are not possible.

and yeah, the third part of paper improvements i may propose is to decribe this attack and analyze hash properties that can make it possible, i.e. reversible computations

jan-wassenberg commented 7 years ago

@Bulat-Ziganshin: https://arxiv.org/pdf/1612.06257v3.pdf We agree this revised test cannot detect differences between SipHash and HighwayHash nor even RDRAND. I'm very interested in creating new tests. Unfortunately you're right that my math/crypto background is limited and the paper actually mentions this twice. However, cryptographers do not seem very interested in 'reasonably secure' > 2byte/cycle hashes, so we are attempting to fill this gap.

A comment by the author of JH (SHA3 finalist): "How to design a hash function that is extremely efficient in software, and easy to analyze - none of the 64 submissions solves this problem".

check several hashes and make conclusion that SipHash doesn't have edge here

We haven't gotten to this because it's not terribly meaningful - hashes that add/mul by constant (reversible) are definitely vulnerable no matter what test they pass. Still, this would be interesting for showing that a test is useful and it's on my todo list (which only grows longer).

can you please show security problems of spooky hash

We are limiting our efforts to analyzing HighwayHash rather than finding flaws in other hashes.

third part of paper improvements i may propose is to decribe this attack and analyze hash properties that can make it possible, i.e. reversible computations

Thank you for this suggestion! I'd like to add this to the readme, similar to the discussion on hash flooding. It seems there aren't many accessible/self-contained discussions of this topic, either (if there are, I'd very much appreciate a pointer).

Bulat-Ziganshin commented 7 years ago

Personally I am interested in increasing security We are limiting our efforts to analyzing HighwayHash rather than finding flaws in other hashes.

Sorry, i can't agree with your attitude, and in fact you easily mention other hashes in the paper - as far as they have less speed/security than your own hash. I believe that HWH is no any more secure than Spooky and lot of other hashes faster than HWH, so i can't help you writing paper that tries to prove opposite by simple ignoring these hashes existance. Indeed, in your world where only broken hashes, siphash and HWH exists, HWH is a best choice :)

We haven't gotten to this because it's not terribly meaningful - hashes that add/mul by constant (reversible) are definitely vulnerable no matter what test they pass. Still, this would be interesting for showing that a test is useful and it's on my todo list (which only grows longer).

Note that HWH/Blake2 also performs only reversible operations (and probably SHA*/SipHash too). XXH/MurMur3A are broken not because they use that, but because they mix all data into single 32-bit word, so you can generate a pair of successive words compensating effect of each other. This is impossible for hashes mixing data around multiple words, that all other hashes do.

We agree this revised test cannot detect differences between SipHash and HighwayHash nor even RDRAND.

The only thing that was really changed is your own analysis method :) You can get the data from v2 paper, plot the same graph and see the same results - for different input sizes different hashes are better. Moreover, you can use the data made with original SMHasher settings (300K iterations and so on) and see the same graph again. As i already said, avalanche is pretty weak test - it doesn't allow to distinguish even XXH32/MurMur3A from good hashes, although lots of other SMHasher tests scream about their weaknesses :)

I'm very interested in creating new tests. Unfortunately you're right that my math/crypto background is limited and the paper actually mentions this twice. However, cryptographers do not seem very interested in 'reasonably secure' > 2byte/cycle hashes, so we are attempting to fill this gap.

All tests i know show that there are no differences between SipHash and Spooky/MurMur3F/HWH. This includes full SMHasher suite with my own analysis method, as well as cryptoattacks described in your paper. If you want to prove that HWH is "secure" hash in this way - good luck! :)

Overall, you made a good work designing one more unsecure hash, or more exact, PRF. It has original construction and pretty fast. Now you are doing absolutely meaningless work, trying to prove that it has better statistical/security characteristics than other (sometimes faster) hashes. It's especially strange, taking into account that you don't have appropriate math/crypto knowledge, so your deductions sometimes becomes laughable. So, while you don't have any evidence that HWH is better than f.e. Spooky, i don't like to spend my (and your!) precious time to this effort (i.e. paper "proving" that HWH is better than other fast hashes by the way of mentioning only hashes that turned out to be worser than HWH). I just don't see any "Research" in such approach.

jan-wassenberg commented 7 years ago

I believe that HWH is no any more secure than Spooky and lot of other hashes faster than HWH

Spooky is a "noncryptographic hash" (http://burtleburtle.net/bob/hash/spooky.html). It allows the attacker to modify the entire state; see http://ehash.iaik.tugraz.at/uploads/5/52/Maraca.pdf for an attack using this. By contrast, HH has a much lower rate of injecting data into the state (1:4 vs. 1:1).

Indeed, in your world where only broken hashes, siphash and HWH exists, HWH is a best choice :)

We focus on hashes with security claims and <= 0.5 cpb cost.

I appreciate your opinion here, which tells us that more tests and analysis are required; we will work on this in the next few months.

Bulat-Ziganshin commented 7 years ago

HWH is also non-crypto hash, it's just a PRF. Spooky has nothing common with Maraca. Spooky, like other good fast hases, inject input data into state only once, and then mix updated state word into another state words. Have you ever looked at the code of murmur3f and spooky?

We focus on hashes with security claims and <= 0.5 cpb cost.

wrong. you mention several hashes w/o any security claims, but only broken ones. you just ignoring existance of hashes that is both faster than hwh, has perfect smhasher stats and can't be breaked by the methods you applied to hwh

jan-wassenberg commented 7 years ago

Have you ever looked at the code of murmur3f and spooky?

Yes, assuming Murmur3F = MurmurHash3_x64_128.

Spooky has nothing common with Maraca.

They have the same issue: injecting n input bits into their n-bit state between compressing/mixing. This is provably insecure in the ideal-permutation model (http://web.cecs.pdx.edu/~teshrim/ohash.pdf). HighwayHash avoids this problem because it is sponge-like with large state.

HWH is also non-crypto hash, it's just a PRF.

PRF is already a very useful property which many hashes do not satisfy. But you're right, we will need to explicitly list the properties.

Bulat-Ziganshin commented 7 years ago

They have the same issue: injecting n input bits into their n-bit state between compressing/mixing.

it's the weakness of xxhash and mur3a. spooky/mur3c/mur3f doesn't have this problem. i suggest you to really look into their code and use the same analysis methods you have tried with hwh. you will see that there is no difference

Spooky is a "noncryptographic hash" (http://burtleburtle.net/bob/hash/spooky.html).

HWH is also non-crypto hash, it's just a PRF.

PRF is already a very useful property which many hashes do not satisfy.

there is no difference between spooky/mur3f and hwh. they all are just prf

jan-wassenberg commented 7 years ago

I agree that the current statistical analysis is weak. We used the term PRF as defined in section 3.4 of http://cseweb.ucsd.edu/~mihir/cse207/w-prf.pdf Under this definition, Murmur2/3 are not PRF (due to differential attack: https://131002.net/siphash/siphashdos_appsec12_slides.pdf). PRF is actually a strong property and difficult to prove.

Bulat-Ziganshin commented 7 years ago

murmur3 is a family of 3 hashes, the attack works only against the first one

when you say that HWH is PRF, you mean that there is no proof of opposite. Can you apply the same logic to spooky/mur3c/mur3f too?

erthink commented 6 years ago

gcc version 5.5.0 20171010 (Ubuntu 5.5.0-12ubuntu1~16.04) Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

Preparing to benchmarking...
 - running on CPU#1
 - use RDPMC_40000001 as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 53 cycle, 0.0188679 iteration/cycle

Bench for tiny keys (7 bytes):
t1ha2_atonce            :     17.301 cycle/hash,  2.472 cycle/byte,  0.405 byte/cycle,  1.214 Gb/s @3GHz 
t1ha2_atonce128*        :     30.641 cycle/hash,  4.377 cycle/byte,  0.228 byte/cycle,  0.685 Gb/s @3GHz 
t1ha2_stream*           :     83.775 cycle/hash, 11.968 cycle/byte,  0.084 byte/cycle,  0.251 Gb/s @3GHz 
t1ha2_stream128*        :    101.750 cycle/hash, 14.536 cycle/byte,  0.069 byte/cycle,  0.206 Gb/s @3GHz 
t1ha1_64le              :     16.281 cycle/hash,  2.326 cycle/byte,  0.430 byte/cycle,  1.290 Gb/s @3GHz 
t1ha0                   :     15.109 cycle/hash,  2.158 cycle/byte,  0.463 byte/cycle,  1.390 Gb/s @3GHz 
xxhash32                :     17.109 cycle/hash,  2.444 cycle/byte,  0.409 byte/cycle,  1.227 Gb/s @3GHz 
xxhash64                :     22.172 cycle/hash,  3.167 cycle/byte,  0.316 byte/cycle,  0.947 Gb/s @3GHz 
HighwayHash64_pure_c    :    560.000 cycle/hash, 80.000 cycle/byte,  0.013 byte/cycle,  0.037 Gb/s @3GHz 
HighwayHash64_portable  :    535.500 cycle/hash, 76.500 cycle/byte,  0.013 byte/cycle,  0.039 Gb/s @3GHz 
HighwayHash64_sse41     :    199.625 cycle/hash, 28.518 cycle/byte,  0.035 byte/cycle,  0.105 Gb/s @3GHz 
HighwayHash64_avx2      :     58.188 cycle/hash,  8.312 cycle/byte,  0.120 byte/cycle,  0.361 Gb/s @3GHz 

Bench for large keys (16384 bytes):
t1ha2_atonce            :   3387.000 cycle/hash,  0.207 cycle/byte,  4.837 byte/cycle, 14.512 Gb/s @3GHz 
t1ha2_atonce128*        :   3390.000 cycle/hash,  0.207 cycle/byte,  4.833 byte/cycle, 14.499 Gb/s @3GHz 
t1ha2_stream*           :   3524.000 cycle/hash,  0.215 cycle/byte,  4.649 byte/cycle, 13.948 Gb/s @3GHz 
t1ha2_stream128*        :   3646.000 cycle/hash,  0.223 cycle/byte,  4.494 byte/cycle, 13.481 Gb/s @3GHz 
t1ha1_64le              :   3566.000 cycle/hash,  0.218 cycle/byte,  4.595 byte/cycle, 13.784 Gb/s @3GHz 
t1ha0                   :   1191.000 cycle/hash,  0.073 cycle/byte, 13.757 byte/cycle, 41.270 Gb/s @3GHz 
xxhash32                :   8198.000 cycle/hash,  0.500 cycle/byte,  1.999 byte/cycle,  5.996 Gb/s @3GHz 
xxhash64                :   4122.000 cycle/hash,  0.252 cycle/byte,  3.975 byte/cycle, 11.924 Gb/s @3GHz 
HighwayHash64_pure_c    :  44810.678 cycle/hash,  2.735 cycle/byte,  0.366 byte/cycle,  1.097 Gb/s @3GHz 
HighwayHash64_portable  :  41622.000 cycle/hash,  2.540 cycle/byte,  0.394 byte/cycle,  1.181 Gb/s @3GHz 
HighwayHash64_sse41     :   6030.000 cycle/hash,  0.368 cycle/byte,  2.717 byte/cycle,  8.151 Gb/s @3GHz 
HighwayHash64_avx2      :   4185.000 cycle/hash,  0.255 cycle/byte,  3.915 byte/cycle, 11.745 Gb/s @3GHz 

clang version 5.0.0-3~16.04.1 (tags/RELEASE_500/final) Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

Preparing to benchmarking...
 - running on CPU#0
 - use RDPMC_40000001 as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 53 cycle, 0.0188679 iteration/cycle

Bench for tiny keys (7 bytes):
t1ha2_atonce            :     12.156 cycle/hash,  1.737 cycle/byte,  0.576 byte/cycle,  1.727 Gb/s @3GHz 
t1ha2_atonce128*        :     29.984 cycle/hash,  4.283 cycle/byte,  0.233 byte/cycle,  0.700 Gb/s @3GHz 
t1ha2_stream*           :     79.125 cycle/hash, 11.304 cycle/byte,  0.088 byte/cycle,  0.265 Gb/s @3GHz 
t1ha2_stream128*        :     99.438 cycle/hash, 14.205 cycle/byte,  0.070 byte/cycle,  0.211 Gb/s @3GHz 
t1ha1_64le              :     12.195 cycle/hash,  1.742 cycle/byte,  0.574 byte/cycle,  1.722 Gb/s @3GHz 
t1ha0                   :     13.125 cycle/hash,  1.875 cycle/byte,  0.533 byte/cycle,  1.600 Gb/s @3GHz 
xxhash32                :     16.297 cycle/hash,  2.328 cycle/byte,  0.430 byte/cycle,  1.289 Gb/s @3GHz 
xxhash64                :     21.219 cycle/hash,  3.031 cycle/byte,  0.330 byte/cycle,  0.990 Gb/s @3GHz 
HighwayHash64_pure_c    :    623.000 cycle/hash, 89.000 cycle/byte,  0.011 byte/cycle,  0.034 Gb/s @3GHz 
HighwayHash64_portable  :    563.000 cycle/hash, 80.429 cycle/byte,  0.012 byte/cycle,  0.037 Gb/s @3GHz 
HighwayHash64_sse41     :    104.438 cycle/hash, 14.920 cycle/byte,  0.067 byte/cycle,  0.201 Gb/s @3GHz 
HighwayHash64_avx2      :     68.312 cycle/hash,  9.759 cycle/byte,  0.102 byte/cycle,  0.307 Gb/s @3GHz 

Bench for large keys (16384 bytes):
t1ha2_atonce            :   3746.000 cycle/hash,  0.229 cycle/byte,  4.374 byte/cycle, 13.121 Gb/s @3GHz 
t1ha2_atonce128*        :   3755.000 cycle/hash,  0.229 cycle/byte,  4.363 byte/cycle, 13.090 Gb/s @3GHz 
t1ha2_stream*           :   3682.000 cycle/hash,  0.225 cycle/byte,  4.450 byte/cycle, 13.349 Gb/s @3GHz 
t1ha2_stream128*        :   3703.000 cycle/hash,  0.226 cycle/byte,  4.425 byte/cycle, 13.274 Gb/s @3GHz 
t1ha1_64le              :   3625.000 cycle/hash,  0.221 cycle/byte,  4.520 byte/cycle, 13.559 Gb/s @3GHz 
t1ha0                   :   1187.000 cycle/hash,  0.072 cycle/byte, 13.803 byte/cycle, 41.409 Gb/s @3GHz 
xxhash32                :   8202.000 cycle/hash,  0.501 cycle/byte,  1.998 byte/cycle,  5.993 Gb/s @3GHz 
xxhash64                :   4126.000 cycle/hash,  0.252 cycle/byte,  3.971 byte/cycle, 11.913 Gb/s @3GHz 
HighwayHash64_pure_c    :  72100.330 cycle/hash,  4.401 cycle/byte,  0.227 byte/cycle,  0.682 Gb/s @3GHz 
HighwayHash64_portable  :  46112.000 cycle/hash,  2.814 cycle/byte,  0.355 byte/cycle,  1.066 Gb/s @3GHz 
HighwayHash64_sse41     :   6244.000 cycle/hash,  0.381 cycle/byte,  2.624 byte/cycle,  7.872 Gb/s @3GHz 
HighwayHash64_avx2      :   4459.000 cycle/hash,  0.272 cycle/byte,  3.674 byte/cycle, 11.023 Gb/s @3GHz 

lcc:1.21.24:Dec--7-2017:e2k-v3-linux (Seems scheduler bug stils not fixed) Elbrus EL2S4

Preparing to benchmarking...
 - running on CPU#13
 - use Elbrus_TSCP as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 39 cycle, 0.025641 iteration/cycle

Bench for tiny keys (7 bytes):
t1ha2_atonce            :     62.031 cycle/hash,  8.862 cycle/byte,  0.113 byte/cycle,  0.339 Gb/s @3GHz 
t1ha2_atonce128*        :    106.062 cycle/hash, 15.152 cycle/byte,  0.066 byte/cycle,  0.198 Gb/s @3GHz 
t1ha2_stream*           :    207.125 cycle/hash, 29.589 cycle/byte,  0.034 byte/cycle,  0.101 Gb/s @3GHz 
t1ha2_stream128*        :    226.125 cycle/hash, 32.304 cycle/byte,  0.031 byte/cycle,  0.093 Gb/s @3GHz 
t1ha1_64le              :     62.031 cycle/hash,  8.862 cycle/byte,  0.113 byte/cycle,  0.339 Gb/s @3GHz 
t1ha0                   :     77.062 cycle/hash, 11.009 cycle/byte,  0.091 byte/cycle,  0.273 Gb/s @3GHz 
xxhash32                :    142.125 cycle/hash, 20.304 cycle/byte,  0.049 byte/cycle,  0.148 Gb/s @3GHz 
xxhash64                :    142.125 cycle/hash, 20.304 cycle/byte,  0.049 byte/cycle,  0.148 Gb/s @3GHz 
HighwayHash64_pure_c    :    498.250 cycle/hash, 71.179 cycle/byte,  0.014 byte/cycle,  0.042 Gb/s @3GHz 
HighwayHash64_portable  :    598.500 cycle/hash, 85.500 cycle/byte,  0.012 byte/cycle,  0.035 Gb/s @3GHz 

Bench for large keys (16384 bytes):
t1ha2_atonce            :   4685.000 cycle/hash,  0.286 cycle/byte,  3.497 byte/cycle, 10.491 Gb/s @3GHz 
t1ha2_atonce128*        :   4716.000 cycle/hash,  0.288 cycle/byte,  3.474 byte/cycle, 10.422 Gb/s @3GHz 
t1ha2_stream*           :   6896.000 cycle/hash,  0.421 cycle/byte,  2.376 byte/cycle,  7.128 Gb/s @3GHz 
t1ha2_stream128*        :   6923.000 cycle/hash,  0.423 cycle/byte,  2.367 byte/cycle,  7.100 Gb/s @3GHz 
t1ha1_64le              :   4697.000 cycle/hash,  0.287 cycle/byte,  3.488 byte/cycle, 10.465 Gb/s @3GHz 
t1ha0                   :   4712.000 cycle/hash,  0.288 cycle/byte,  3.477 byte/cycle, 10.431 Gb/s @3GHz 
xxhash32                :   8274.000 cycle/hash,  0.505 cycle/byte,  1.980 byte/cycle,  5.941 Gb/s @3GHz 
xxhash64                :   4220.000 cycle/hash,  0.258 cycle/byte,  3.882 byte/cycle, 11.647 Gb/s @3GHz 
HighwayHash64_pure_c    :  52047.000 cycle/hash,  3.177 cycle/byte,  0.315 byte/cycle,  0.944 Gb/s @3GHz 
HighwayHash64_portable  :  54241.000 cycle/hash,  3.311 cycle/byte,  0.302 byte/cycle,  0.906 Gb/s @3GHz 

PHDays-2018

data-man commented 6 years ago

Why so old compilers?

erthink commented 6 years ago

@data-man, since GCC 5.5 and CLANG 5.0 results are "the same" for a newest compiler versions.

Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz gcc version 8.1.1 20180502 (Red Hat 8.1.1-1) (GCC)

Preparing to benchmarking...
 - running on CPU#5
 - use RDPMC_40000001 as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 53 cycle, 0.0188679 iteration/cycle

Bench for tiny keys (7 bytes):
t1ha2_atonce            :     15.156 cycle/hash,  2.165 cycle/byte,  0.462 byte/cycle,  1.386 Gb/s @3GHz 
t1ha2_atonce128*        :     29.737 cycle/hash,  4.248 cycle/byte,  0.235 byte/cycle,  0.706 Gb/s @3GHz 
t1ha2_stream*           :     80.625 cycle/hash, 11.518 cycle/byte,  0.087 byte/cycle,  0.260 Gb/s @3GHz 
t1ha2_stream128*        :     98.438 cycle/hash, 14.063 cycle/byte,  0.071 byte/cycle,  0.213 Gb/s @3GHz 
t1ha1_64le              :     15.125 cycle/hash,  2.161 cycle/byte,  0.463 byte/cycle,  1.388 Gb/s @3GHz 
t1ha0                   :     14.062 cycle/hash,  2.009 cycle/byte,  0.498 byte/cycle,  1.493 Gb/s @3GHz 
xxhash32                :     16.922 cycle/hash,  2.417 cycle/byte,  0.414 byte/cycle,  1.241 Gb/s @3GHz 
xxhash64                :     19.359 cycle/hash,  2.766 cycle/byte,  0.362 byte/cycle,  1.085 Gb/s @3GHz 
HighwayHash64_pure_c    :    641.000 cycle/hash, 91.571 cycle/byte,  0.011 byte/cycle,  0.033 Gb/s @3GHz 
HighwayHash64_portable  :    481.750 cycle/hash, 68.821 cycle/byte,  0.015 byte/cycle,  0.044 Gb/s @3GHz 
HighwayHash64_sse41     :     85.500 cycle/hash, 12.214 cycle/byte,  0.082 byte/cycle,  0.246 Gb/s @3GHz 
HighwayHash64_avx2      :     82.938 cycle/hash, 11.848 cycle/byte,  0.084 byte/cycle,  0.253 Gb/s @3GHz 

Bench for large keys (16384 bytes):
t1ha2_atonce            :   3433.000 cycle/hash,  0.210 cycle/byte,  4.773 byte/cycle, 14.318 Gb/s @3GHz 
t1ha2_atonce128*        :   3435.000 cycle/hash,  0.210 cycle/byte,  4.770 byte/cycle, 14.309 Gb/s @3GHz 
t1ha2_stream*           :   3462.000 cycle/hash,  0.211 cycle/byte,  4.733 byte/cycle, 14.198 Gb/s @3GHz 
t1ha2_stream128*        :   3475.000 cycle/hash,  0.212 cycle/byte,  4.715 byte/cycle, 14.144 Gb/s @3GHz 
t1ha1_64le              :   3403.000 cycle/hash,  0.208 cycle/byte,  4.815 byte/cycle, 14.444 Gb/s @3GHz 
t1ha0                   :   1184.000 cycle/hash,  0.072 cycle/byte, 13.838 byte/cycle, 41.514 Gb/s @3GHz 
xxhash32                :   8198.000 cycle/hash,  0.500 cycle/byte,  1.999 byte/cycle,  5.996 Gb/s @3GHz 
xxhash64                :   4121.000 cycle/hash,  0.252 cycle/byte,  3.976 byte/cycle, 11.927 Gb/s @3GHz 
HighwayHash64_pure_c    :  52994.000 cycle/hash,  3.234 cycle/byte,  0.309 byte/cycle,  0.928 Gb/s @3GHz 
HighwayHash64_portable  :  42019.064 cycle/hash,  2.565 cycle/byte,  0.390 byte/cycle,  1.170 Gb/s @3GHz 
HighwayHash64_sse41     :   6225.000 cycle/hash,  0.380 cycle/byte,  2.632 byte/cycle,  7.896 Gb/s @3GHz 
HighwayHash64_avx2      :   4609.000 cycle/hash,  0.281 cycle/byte,  3.555 byte/cycle, 10.664 Gb/s @3GHz 

Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz clang version 6.0.0 (tags/RELEASE_600/final)

Preparing to benchmarking...
 - running on CPU#2
 - use RDPMC_40000001 as clock source for benchmarking
 - assume it cheap and stable
 - measure granularity and overhead: 53 cycle, 0.0188679 iteration/cycle

Bench for tiny keys (7 bytes):
t1ha2_atonce            :     12.156 cycle/hash,  1.737 cycle/byte,  0.576 byte/cycle,  1.728 Gb/s @3GHz 
t1ha2_atonce128*        :     29.703 cycle/hash,  4.243 cycle/byte,  0.236 byte/cycle,  0.707 Gb/s @3GHz 
t1ha2_stream*           :     77.625 cycle/hash, 11.089 cycle/byte,  0.090 byte/cycle,  0.271 Gb/s @3GHz 
t1ha2_stream128*        :     97.438 cycle/hash, 13.920 cycle/byte,  0.072 byte/cycle,  0.216 Gb/s @3GHz 
t1ha1_64le              :     12.172 cycle/hash,  1.739 cycle/byte,  0.575 byte/cycle,  1.725 Gb/s @3GHz 
t1ha0                   :     14.070 cycle/hash,  2.010 cycle/byte,  0.498 byte/cycle,  1.493 Gb/s @3GHz 
xxhash32                :     16.438 cycle/hash,  2.348 cycle/byte,  0.426 byte/cycle,  1.278 Gb/s @3GHz 
xxhash64                :     22.188 cycle/hash,  3.170 cycle/byte,  0.315 byte/cycle,  0.946 Gb/s @3GHz 
HighwayHash64_pure_c    :    542.000 cycle/hash, 77.429 cycle/byte,  0.013 byte/cycle,  0.039 Gb/s @3GHz 
HighwayHash64_portable  :    508.250 cycle/hash, 72.607 cycle/byte,  0.014 byte/cycle,  0.041 Gb/s @3GHz 
HighwayHash64_sse41     :     81.750 cycle/hash, 11.679 cycle/byte,  0.086 byte/cycle,  0.257 Gb/s @3GHz 
HighwayHash64_avx2      :     51.938 cycle/hash,  7.420 cycle/byte,  0.135 byte/cycle,  0.404 Gb/s @3GHz 

Bench for large keys (16384 bytes):
t1ha2_atonce            :   3752.000 cycle/hash,  0.229 cycle/byte,  4.367 byte/cycle, 13.100 Gb/s @3GHz 
t1ha2_atonce128*        :   3757.000 cycle/hash,  0.229 cycle/byte,  4.361 byte/cycle, 13.083 Gb/s @3GHz 
t1ha2_stream*           :   3740.000 cycle/hash,  0.228 cycle/byte,  4.381 byte/cycle, 13.142 Gb/s @3GHz 
t1ha2_stream128*        :   3755.000 cycle/hash,  0.229 cycle/byte,  4.363 byte/cycle, 13.090 Gb/s @3GHz 
t1ha1_64le              :   3626.000 cycle/hash,  0.221 cycle/byte,  4.518 byte/cycle, 13.555 Gb/s @3GHz 
t1ha0                   :   1187.000 cycle/hash,  0.072 cycle/byte, 13.803 byte/cycle, 41.409 Gb/s @3GHz 
xxhash32                :   8207.000 cycle/hash,  0.501 cycle/byte,  1.996 byte/cycle,  5.989 Gb/s @3GHz 
xxhash64                :   4127.000 cycle/hash,  0.252 cycle/byte,  3.970 byte/cycle, 11.910 Gb/s @3GHz 
HighwayHash64_pure_c    :  48584.421 cycle/hash,  2.965 cycle/byte,  0.337 byte/cycle,  1.012 Gb/s @3GHz 
HighwayHash64_portable  :  46494.000 cycle/hash,  2.838 cycle/byte,  0.352 byte/cycle,  1.057 Gb/s @3GHz 
HighwayHash64_sse41     :   6224.000 cycle/hash,  0.380 cycle/byte,  2.632 byte/cycle,  7.897 Gb/s @3GHz 
HighwayHash64_avx2      :   4512.000 cycle/hash,  0.275 cycle/byte,  3.631 byte/cycle, 10.894 Gb/s @3GHz