floodyberry / blake2b-opt

Optimized, portable implementations of BLAKE2b
24 stars 6 forks source link

AVX2-64 code seems to be broken #1

Open vstakhov opened 9 years ago

vstakhov commented 9 years ago

I've tried to compile the code on OSX system. However, the compilation failed with the following error:

clang -cc1as: fatal error: error in backend: 32-bit absolute addressing is not supported in 64-bit mode

I've tried to fix it by switching to %rip addressing applying patch like this one: https://gist.github.com/vstakhov/37442eaf04ebfdd315e0 but despite of compiling it caused core dump:

Process 90423 stopped
* thread #1: tid = 0x2147d4b, 0x0000000100007200 blake2b-util`.Lblake2b_blocks_avx2_11, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x18100800)
    frame #0: 0x0000000100007200 blake2b-util`.Lblake2b_blocks_avx2_11
blake2b-util`.Lblake2b_blocks_avx2_11:
->  0x100007200 <+0>:  movzbl (%rax), %ebx
    0x100007203 <+3>:  addq   $0x10, %rax
    0x100007207 <+7>:  movzbl -0xc(%rax), %r13d
    0x10000720c <+12>: movzbl -0xe(%rax), %r11d

Registers content:

General Purpose Registers:
       rax = 0x0000000018100800
       rbx = 0x0000000000000000
       rcx = 0x0000000000000000
       rdx = 0x0000000000000000
       rdi = 0x00007fff5fbff8e0
       rsi = 0x00007fff5fbff4c0
       rbp = 0x00007fff5fbff590
       rsp = 0x00007fff5fbff4f8
        r8 = 0xffffffffffffffff
        r9 = 0xffffffffffffffff
       r10 = 0x0000000000000000
       r11 = 0x0000000000000000
       r12 = 0x0000000100006900  blake2b-util`blake2b_constants
       r13 = 0x0000000100006a00  blake2b-util`blake2b_constants_ssse3
       r14 = 0x00007fff5fbff930
       r15 = 0x0000000000000000
       rip = 0x0000000100007200  blake2b-util`.Lblake2b_blocks_avx2_11
    rflags = 0x0000000000010286
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

Other extensions work fine after fuzz testing.

bit4 commented 8 years ago

Can you try the attached patch?

blake2b-opt_fix-avx2-address-load.patch.txt

vstakhov commented 8 years ago

That helped, thank you.

bin/blake2b-util bench
time granularity: 24 cycles, 2195297384 cycles/second

1 byte(s):
          avx2,   396.00 cycles per call, 396.0000 cycles/byte
           avx,   333.00 cycles per call, 333.0000 cycles/byte
           x86,   356.00 cycles per call, 356.0000 cycles/byte
    generic/64,   586.00 cycles per call, 586.0000 cycles/byte
128 byte(s):
          avx2,   389.00 cycles per call,   3.0391 cycles/byte
           avx,   316.00 cycles per call,   2.4688 cycles/byte
           x86,   353.00 cycles per call,   2.7578 cycles/byte
    generic/64,   581.00 cycles per call,   4.5391 cycles/byte
576 byte(s):
          avx2,  1416.00 cycles per call,   2.4583 cycles/byte
           avx,  1474.00 cycles per call,   2.5590 cycles/byte
           x86,  1648.00 cycles per call,   2.8611 cycles/byte
    generic/64,  2450.00 cycles per call,   4.2535 cycles/byte
8192 byte(s):
          avx2, 16426.00 cycles per call,   2.0051 cycles/byte
           avx, 18352.00 cycles per call,   2.2402 cycles/byte
           x86, 20888.00 cycles per call,   2.5498 cycles/byte
    generic/64, 28548.00 cycles per call,   3.4849 cycles/byte
jrmithdobbs commented 8 years ago

Looks like #3 corrects this issue.