cloudflare / sslconfig

Cloudflare's Internet facing SSL configuration
BSD 3-Clause "New" or "Revised" License
1.3k stars 132 forks source link

Illegal instruction in ChaCha20 code on old AMD processors #51

Closed rraptorr closed 7 years ago

rraptorr commented 7 years ago

Probably nothing can be done about this, but I'll report this anyway. I've recently upgraded all of my machines to latest OpenSSL 1.0.2j patch, with new ChaCha20 code. While I mostly use various Xeons, there are a couple of old AMD machines as well. Turns out, the ChaCha20 code crashes with illegal instruction errors, both in client and server mode, on those AMD boxes.

What I was able to capture from gdb:


Starting program: /usr/bin/openssl s_client -connect cloudflare.com:443 -cipher ECDHE-RSA-CHACHA20-POLY1305
CONNECTED(00000003)
depth=3 C = SE, O = AddTrust AB, OU = AddTrust External TTP Network, CN = AddTrust External CA Root
verify return:1
depth=2 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Certification Authority
verify return:1
depth=1 C = GB, ST = Greater Manchester, L = Salford, O = COMODO CA Limited, CN = COMODO RSA Extended Validation Secure Server CA
verify return:1
depth=0 serialNumber = 4710875, jurisdictionC = US, jurisdictionST = Delaware, businessCategory = Private Organization, C = US, postalCode = 94107, ST = California, L = San Francisco, street = "655 Third Street, Suite 200", O = "CloudFlare, Inc.", OU = COMODO EV Multi-Domain SSL
verify return:1

Program received signal SIGILL, Illegal instruction.
seal_sse_128 () at chacha20_poly1305_x86_64.s:3675
3675    chacha20_poly1305_x86_64.s: No such file or directory.
(gdb) bt
#0  seal_sse_128 () at chacha20_poly1305_x86_64.s:3675
#1  0x00007ffff7feed60 in ?? ()
#2  0x00007fffffffd6a0 in ?? ()
#3  0x00007ffff7ff7828 in ?? ()
#4  0x00007fffffffd6c8 in ?? ()
#5  0x00007ffff7ff74d0 in ?? ()
#6  0x0000000000000001 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) info frame
Stack level 0, frame at 0x7fffffffd550:
 rip = 0x7ffff78a8c00 in seal_sse_128 (chacha20_poly1305_x86_64.s:3675); saved rip = 0x7ffff7feed60
 called by frame at 0x7fffffffd558
 source language asm.
 Arglist at 0x7fffffffd540, args: 
 Locals at 0x7fffffffd540, Previous frame's sp is 0x7fffffffd550
 Saved registers:
  rip at 0x7fffffffd548
(gdb) disassemble 
Dump of assembler code for function seal_sse_128:
   0x00007ffff78a8b93 <+0>:     movdqu -0x429b(%rip),%xmm0        # 0x7ffff78a4900 <.chacha20_consts>
   0x00007ffff78a8b9b <+8>:     movdqa %xmm0,%xmm1
   0x00007ffff78a8b9f <+12>:    movdqa %xmm0,%xmm2
   0x00007ffff78a8ba3 <+16>:    movdqu (%r9),%xmm4
   0x00007ffff78a8ba8 <+21>:    movdqa %xmm4,%xmm5
   0x00007ffff78a8bac <+25>:    movdqa %xmm4,%xmm6
   0x00007ffff78a8bb0 <+29>:    movdqu 0x10(%r9),%xmm8
   0x00007ffff78a8bb6 <+35>:    movdqa %xmm8,%xmm9
   0x00007ffff78a8bbb <+40>:    movdqa %xmm8,%xmm10
   0x00007ffff78a8bc0 <+45>:    movdqu 0x20(%r9),%xmm14
   0x00007ffff78a8bc6 <+51>:    movdqa %xmm14,%xmm12
   0x00007ffff78a8bcb <+56>:    paddd  -0x4264(%rip),%xmm12        # 0x7ffff78a4970 <.sse_inc>
   0x00007ffff78a8bd4 <+65>:    movdqa %xmm12,%xmm13
   0x00007ffff78a8bd9 <+70>:    paddd  -0x4272(%rip),%xmm13        # 0x7ffff78a4970 <.sse_inc>
   0x00007ffff78a8be2 <+79>:    movdqa %xmm4,%xmm7
   0x00007ffff78a8be6 <+83>:    movdqa %xmm8,%xmm11
   0x00007ffff78a8beb <+88>:    movdqa %xmm12,%xmm15
   0x00007ffff78a8bf0 <+93>:    mov    $0xa,%r10
   0x00007ffff78a8bf7 <+100>:   paddd  %xmm4,%xmm0
   0x00007ffff78a8bfb <+104>:   pxor   %xmm0,%xmm12
=> 0x00007ffff78a8c00 <+109>:   pshufb -0x42ca(%rip),%xmm12        # 0x7ffff78a4940 <.rol16>
   0x00007ffff78a8c0a <+119>:   paddd  %xmm12,%xmm8
   0x00007ffff78a8c0f <+124>:   pxor   %xmm8,%xmm4
   0x00007ffff78a8c14 <+129>:   movdqa %xmm4,%xmm3
   0x00007ffff78a8c18 <+133>:   pslld  $0xc,%xmm3
   0x00007ffff78a8c1d <+138>:   psrld  $0x14,%xmm4
   0x00007ffff78a8c22 <+143>:   pxor   %xmm3,%xmm4
   0x00007ffff78a8c26 <+147>:   paddd  %xmm4,%xmm0
   0x00007ffff78a8c2a <+151>:   pxor   %xmm0,%xmm12
   0x00007ffff78a8c2f <+156>:   pshufb -0x4319(%rip),%xmm12        # 0x7ffff78a4920 <.rol8>
   0x00007ffff78a8c39 <+166>:   paddd  %xmm12,%xmm8
   0x00007ffff78a8c3e <+171>:   pxor   %xmm8,%xmm4
   0x00007ffff78a8c43 <+176>:   movdqa %xmm4,%xmm3
   0x00007ffff78a8c47 <+180>:   pslld  $0x7,%xmm3
   0x00007ffff78a8c4c <+185>:   psrld  $0x19,%xmm4
   0x00007ffff78a8c51 <+190>:   pxor   %xmm3,%xmm4
   0x00007ffff78a8c55 <+194>:   palignr $0x4,%xmm4,%xmm4
   0x00007ffff78a8c5b <+200>:   palignr $0x8,%xmm8,%xmm8
   0x00007ffff78a8c62 <+207>:   palignr $0xc,%xmm12,%xmm12
   0x00007ffff78a8c69 <+214>:   paddd  %xmm5,%xmm1
   0x00007ffff78a8c6d <+218>:   pxor   %xmm1,%xmm13
   0x00007ffff78a8c72 <+223>:   pshufb -0x433c(%rip),%xmm13        # 0x7ffff78a4940 <.rol16>
   0x00007ffff78a8c7c <+233>:   paddd  %xmm13,%xmm9
   0x00007ffff78a8c81 <+238>:   pxor   %xmm9,%xmm5
   0x00007ffff78a8c86 <+243>:   movdqa %xmm5,%xmm3
   0x00007ffff78a8c8a <+247>:   pslld  $0xc,%xmm3
   0x00007ffff78a8c8f <+252>:   psrld  $0x14,%xmm5
   0x00007ffff78a8c94 <+257>:   pxor   %xmm3,%xmm5
   0x00007ffff78a8c98 <+261>:   paddd  %xmm5,%xmm1
   0x00007ffff78a8c9c <+265>:   pxor   %xmm1,%xmm13````

Processor info:

````processor       : 0
vendor_id       : AuthenticAMD                                                                                                                                                                 
cpu family      : 15                                                                                                                                                                           
model           : 67                                                                                                                                                                           
model name      : Dual-Core AMD Opteron(tm) Processor 1218 HE                                                                                                                                  
stepping        : 3                                                                                                                                                                            
cpu MHz         : 1800.000                                                                                                                                                                     
cache size      : 1024 KB                                                                                                                                                                      
physical id     : 0                                                                                                                                                                            
siblings        : 2                                                                                                                                                                            
core id         : 0                                                                                                                                                                            
cpu cores       : 2                                                                                                                                                                            
apicid          : 0                                                                                                                                                                            
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good nopl extd_apicid eagerfpu pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch vmmcall
bugs            : apic_c1e fxsave_leak sysret_ss_attrs null_seg swapgs_fence
bogomips        : 3618.52
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc````

The previous ChaCha20 patch worked on those machines without problems.
marcin-gryszkalis commented 7 years ago

pshufb (packed shuffle bytes) is not supported on AMD K8/K10, first included on AMD FX* (Bulldozer). It comes with SSSE3 (Supplemental SSE3) set.

vkrasnov commented 7 years ago

@marcin-gryszkalis is correct. The reason the previous patch worked was because it only supported avx and avx2 sets. On anything older it would use the generic C code. Now there is a fast sse implementation, but it is only ssse3 and greater, and assumes every relevant 64bit architecture supports it.

I didn't really see a reason to support anything older than 10 years.

rraptorr commented 7 years ago

Thanks for the explanation, that's basically what I've expected ;)