BLAKE3-team / BLAKE3

the official Rust and C implementations of the BLAKE3 cryptographic hash function
Apache License 2.0
5.14k stars 350 forks source link

How to comfirm which optimised assembly is used (C)? #134

Open fadedbee opened 3 years ago

fadedbee commented 3 years ago

I've embedded the various BLAKE3 C source files in an existing autotools project. (I had to add AM_PROG_AS to configure.ac.)

How can I confirm that the blake3_dispatch is working correctly? Ideally I'd like to set -DDEBUG and see which implementation it was choosing. Or perhaps a new function in blake3_dispatch to report which specialised code was used?

Assuming that debug of this sort doesn't currently exist, would this be a welcome addition upstream?

oconnor663 commented 3 years ago

The implementation chosen is the most recent instruction set supported on the current machine. Currently that's mainly Intel instruction sets, so you get:

You can see what your CPU supports with e.g. cat /proc/cpuinfo on Linux. The only other thing that can affect dispatching is if you disabled some instruction sets at build time, like with -DBLAKE3_NO_AVX512.

If you want to see this in action, I'd suggest putting in some print statements in the blake3_hash_many and then hashing a 100 KB input. (You could also log from blake3_compress_in_place or blake3_compress_xof, but AVX2 doesn't show up in those functions.)

As far as a public, stable debugging API, what would be the expected use case there?

fadedbee commented 3 years ago

Thanks for your response.

Can you leave this issue open, as I'm getting some very odd results? Mostly the SSE41 code is run, occasionally the SSE2 code. My CPU reports "avx":

$ cat /proc/cpuinfo | grep sse | head -1
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d

More debug to follow...

oconnor663 commented 3 years ago

When you say "Mostly the SSE41 code is run", could you clarify what that means? Where are you putting your print statements, and what commands are you running to test the code?