Closed lamont-granquist closed 3 years ago
You can try setting X25519.provider
to X25519::Provider::Ref10
explicitly and see if that solves your problem.
This gem uses runtime detection (i.e. via CPUID) in order to detect the presence of the requisite instructions:
https://github.com/RubyCrypto/x25519/blob/master/lib/x25519.rb#L33-L39
Here's the relevant code:
https://github.com/RubyCrypto/x25519/blob/master/ext/x25519_precomputed/cputest.c
Illegal instruction would indicate one (or more) of the following:
cputest.c
tests for, which would indicate something like march
being incorrectly configured in your build environmentcputest.c
, and it is giving a false positive for the requisite instructions. note that cputest.c
is provided by IntelA possible thought is that the gem could have been installed+compiled on a later processor architecture randomly and is getting deployed via the cache onto an older one?
This should only matter if something like -march=native
were used as part of the build, as otherwise the CPU features explicitly required by this crate are detected at runtime. Note that something like that would be outside the scope of this particular crate and would impact every native extension. However, in such a case the compiler is free to emit instructions which are native to the build host and may not be available once the binary is relocated.
Some helpful debugging info you could try collecting:
X25519::Provider::Precomputed.available?
X25519.provider
Ok, so @tarcieri, I available to investigate on this issue.
I see that the whole community just deleted the gem. I think it's bad. :(
I have this issue with compilation on a new Intel(R) Xeon(R) Gold 5218 and after run the application on a Intel(R) Core(TM) i7-10700 CPU. I think that the 2 CPU are recent enough to have the instruction capability.
On irb I have on both computer :
docker run -it ruby:3.0 bash
root@75b135d223c1:/# gem install x25519
Fetching x25519-1.0.8.gem
Building native extensions. This could take a while...
Successfully installed x25519-1.0.8
1 gem installed
root@75b135d223c1:/# irb
irb(main):001:0> require 'x25519'
=> true
irb(main):002:0> X25519::Provider::Precomputed.available?
=> true
irb(main):003:0> X25519.provider
=> X25519::Provider::Precomputed
I see you're running it from Docker. Are you on a platform where Docker would be using a VM by any chance? (e.g. macOS using HyperKit)
A couple things that would be helpful...
Try explicitly selecting the Ref10 backend and see if that resolves the issue:
X25519::Provider = X25519::Provider::Ref10
If this works, it might make sense to just disable the X25519::Provider::Precomputed
backend as it seems to be causing problems. But it'd also be great to know what the problem actually is.
To debug the SIGILL, it'd be very helpful to get register output at the time of the crash in order to see which instruction is causing the problem specifically.
You can potentially use a tool like gdb/lldb to do this. If you manage to capture the SIGILL in either, try:
info all-registers
register read --all
Are you on a platform where Docker would be using a VM by any chance?
I use a ubuntu vm on top of the last esxi version for the intel xeon setup and the native docker deamon on a physical ubuntu desktop for the intel i7 setup.
I will try to fork the gem to test the command:
X25519::Provider = X25519::Provider::Ref10
because the error occurs directly on :
require 'x25519'
with the self_test method here : https://github.com/RubyCrypto/x25519/blob/master/lib/x25519.rb#L95
It's possible the VM lacks support for the instructions.
I can look into trying to improve the test to use something like CPUID which should reflect the instructions supported by the VM.
Ok, so Ref10 works.
I remove https://github.com/RubyCrypto/x25519/blob/master/lib/x25519.rb#L95
irb(main):002:0> require 'x25519'
=> true
irb(main):003:0> X25519.provider = X25519::Provider::Ref10
=> X25519::Provider::Ref10
irb(main):004:0> X25519.provider
=> X25519::Provider::Ref10
irb(main):005:0> X25519.self_test
=> true
and with default :
irb(main):001:0> require 'x25519'
=> true
irb(main):002:0> X25519.provider
=> X25519::Provider::Precomputed
irb(main):003:0> X25519.self_test
/usr/local/bundle/gems/x25519-1.0.8/lib/x25519.rb:81: [BUG] Illegal instruction at 0x00007ff8fb954371
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux]
Do you know which intruction are require to compile correctly ?
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss s
yscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc cpuid pn
i pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16
c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enh
anced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid avx512f avx512dq rdseed adx smap clflushopt clwb
avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat pku ospke avx512_vnni md_clear flush_l
1d arch_capabilities
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU family: 6
Model: 165
Model name: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp pku ospke md_clear flush_l1d arch_capabilities
may be a interference with avx512 ?
The X25519::Provider::Precomputed
backend needs AVX2, though it would be good to get the register dump to know what instruction it's crashing on specifically
It would be interesting to know what this gem reports under CPUID.features
as well:
https://www.rubydoc.info/gems/cpuid/0.4.0/CPUID
If the AVX2 flag is present in CPUID and it's crashing with SIGILL on an AVX2 instruction, it would seem to be an issue with the VM itself IMO.
Hmm, tried to do it myself and got this error:
cpuid-0.4.0/lib/cpuid/cpuid.rb:60: [BUG] vm_call_cfunc: cfp consistency error (0x0000000063a1a9a8, 0x00007fe363a1a9a8)
I see that gem hasn't been touched since 2009 either.
Here's a post on how to get CPUID via FFI:
https://www.cstrahan.com/posts/2013-07-15-pure-ruby-cpuid-via-ffi.html
Otherwise I can use a very small C/ASM shim to just check for the presence of AVX2, but it would be good to know whether or not it's showing up inside your VM first.
As it were, that's what the existing Intel code is doing, but perhaps in a more complicated manner than it needs to.
Hmm, tried to do it myself and got this error:
cpuid-0.4.0/lib/cpuid/cpuid.rb:60: [BUG] vm_call_cfunc: cfp consistency error (0x0000000063a1a9a8, 0x00007fe363a1a9a8)
I see that gem hasn't been touched since 2009 either.
Here's a post on how to get CPUID via FFI:
https://www.cstrahan.com/posts/2013-07-15-pure-ruby-cpuid-via-ffi.html
~Otherwise I can use a very small C/ASM shim to just check for the presence of AVX2, but it would be good to know whether or not it's showing up inside your VM first.~
As it were, that's what the existing Intel code is doing, but perhaps in a more complicated manner than it needs to.
Yes I have something like this on every setup:
HOME/.rvm/gems/ruby-3.0.1/gems/cpuid-0.4.0/lib/cpuid/cpuid.rb:56: [BUG] Segmentation fault at 0x00000000a51f78f0
ruby 3.0.1p64 (2021-04-05 revision 0fb782ee38) [x86_64-linux]
@tarcieri I try to use gdb to capute the SIGILL, but I'm not familiar with it.
For gdb, you can try:
$ gdb ruby
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-15.el8
[...]
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ruby...done.
Then:
(gdb) run -e "require 'x25519'"
Starting program: /usr/bin/ruby -e "require 'x25519'"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff7fee700 (LWP 1380181)]
[Thread 0x7ffff7fee700 (LWP 1380181) exited]
[Inferior 1 (process 1380177) exited normally]
(gdb)
It should (hopefully) catch the SIGILL, at which point you can run:
(gdb) info all-registers
Thanks for the procedure.
I have more information :
(gdb) run -e "require 'x25519'"
Starting program: /usr/local/bin/ruby -e "require 'x25519'"
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGILL, Illegal instruction.
mX25519_Provider_Precomputed_scalarmult (self=<optimized out>, scalar=<optimized out>, montgomery_u=<optimized out>)
at x25519_precomputed.c:56
56 memcpy(raw_montgomery_u, RSTRING_PTR(montgomery_u), X25519_KEYSIZE_BYTES);
(gdb) info all-registers
rax 0x5596b0332678 94105689597560
rbx 0x5596affed720 94105686169376
rcx 0x20 32
rdx 0x5596b036a610 94105689826832
rsi 0x5596b0332768 94105689597800
rdi 0x7ffc9e607a90 140722965609104
rbp 0x7ffc9e607b20 0x7ffc9e607b20
rsp 0x7ffc9e607a80 0x7ffc9e607a80
r8 0x5596b0332650 94105689597520
r9 0x10 16
r10 0xffffffffffffffcf -49
r11 0x7fe91b2c74d0 140639160005840
r12 0xcdf100100005 226434971860997
r13 0x7fe91a768e18 140639148084760
r14 0x2 2
r15 0x5596b0410c00 94105690508288
rip 0x7fe9171a8371 0x7fe9171a8371 <mX25519_Provider_Precomputed_scalarmult+161>
eflags 0x10202 [ IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
st0 0 (raw 0x00000000000000000000)
st1 0 (raw 0x00000000000000000000)
st2 0 (raw 0x00000000000000000000)
st3 0 (raw 0x00000000000000000000)
st4 0 (raw 0x00000000000000000000)
st5 0 (raw 0x00000000000000000000)
st6 0 (raw 0x00000000000000000000)
--Type <RET> for more, q to quit, c to continue without paging--
st7 0 (raw 0x00000000000000000000)
fctrl 0x37f 895
fstat 0x0 0
ftag 0xffff 65535
fiseg 0x0 0
fioff 0x0 0
foseg 0x0 0
fooff 0x0 0
fop 0x0 0
mxcsr 0x1fa0 [ PE IM DM ZM OM UM PM ]
bndcfgu {raw = 0x0, config = {base = 0x0, reserved = 0x0, preserved = 0x0, enabled = 0x0}} {raw = 0x0, config = {base = 0, reserved = 0, preserved = 0, enabled = 0}}
bndstatus {raw = 0x0, status = {bde = 0x0, error = 0x0}} {raw = 0x0, status = {bde = 0, error = 0}}
pkru 0x55555554 1431655764
ymm0 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x90, 0xc3, 0x41, 0xb0, 0x96, 0x55, 0x0 <repeats 26 times>}, v16_int16 = {0xc390, 0xb041, 0x5596, 0x0 <repeats 13 times>}, v8_int32 = {0xb041c390, 0x5596, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x5596b041c390, 0x0, 0x0, 0x0}, v2_int128 = {0x5596b041c390, 0x0}}
ymm1 {v8_float = {0xffffffee, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x7fffffffffffffff, 0x0, 0x0, 0x0}, v32_int8 = {0xdb, 0x35, 0x94, 0xc1, 0xa4, 0x24, 0xb1, 0x5f, 0x7c, 0x72, 0x66, 0x24, 0xec, 0x26, 0xb3, 0x35, 0x0 <repeats 16 times>}, v16_int16 = {0x35db, 0xc194, 0x24a4, 0x5fb1, 0x727c, 0x2466, 0x26ec, 0x35b3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0xc19435db, 0x5fb124a4, 0x2466727c, 0x35b326ec, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x5fb124a4c19435db, 0x35b326ec2466727c, 0x0, 0x0}, v2_int128 = {0x35b326ec2466727c5fb124a4c19435db, 0x0}}
ymm2 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0xff, 0x0 <repeats 28 times>}, v16_int16 = {0x0, 0xff00, 0x0 <repeats 14 times>}, v8_int32 = {0xff000000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xff000000, 0x0, 0x0, 0x0}, v2_int128 = {0xff000000, 0x0}}
ymm3 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xf8, 0xa, 0x38, 0xb0, 0x96, 0x55, 0x0 <repeats 18 times>}, v16_int16 = {0x8, 0x0, 0x0, 0x0, 0xaf8, 0xb038, 0x5596, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x8, 0x0, 0xb0380af8, 0x5596, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x8, 0x5596b0380af8, 0x0, 0x0}, v2_int128 = {0x5596b0380af80000000000000008, 0x0}}
ymm4 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x28, 0x59, 0x41, 0xb0, 0x96, 0x55, 0x0, 0x0, 0x0, 0x59, 0x41, 0xb0, 0x96, 0x55, 0x0 <repeats 18 times>}, v16_int16 = {0x5928, 0xb041, 0x5596, 0x0, 0--Type <RET> for more, q to quit, c to continue without paging--
x5900, 0xb041, 0x5596, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0xb0415928, 0x5596, 0xb0415900, 0x5596, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x5596b0415928, 0x5596b0415900, 0x0, 0x0}, v2_int128 = {0x5596b041590000005596b0415928, 0x0}}
ymm5 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xc, 0xd1, 0xc, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc, 0xf1, 0xc, 0x0 <repeats 21 times>}, v16_int16 = {0xd10c, 0xc, 0x0, 0x0, 0xf10c, 0xc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0xcd10c, 0x0, 0xcf10c, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xcd10c, 0xcf10c, 0x0, 0x0}, v2_int128 = {0xcf10c00000000000cd10c, 0x0}}
ymm6 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0xc, 0x26, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc, 0x81, 0x6a, 0x0 <repeats 21 times>}, v16_int16 = {0x260c, 0x0, 0x0, 0x0, 0x810c, 0x6a, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x260c, 0x0, 0x6a810c, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x260c, 0x6a810c, 0x0, 0x0}, v2_int128 = {0x6a810c000000000000260c, 0x0}}
ymm7 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8, 0x0 <repeats 23 times>}, v16_int16 = {0x8, 0x0, 0x0, 0x0, 0x8, 0x0 <repeats 11 times>}, v8_int32 = {0x8, 0x0, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x8, 0x8, 0x0, 0x0}, v2_int128 = {0x80000000000000008, 0x0}}
ymm8 {v8_float = {0x0, 0xffffffff, 0xffffffff, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x7fffffffffffffff, 0x0, 0x0, 0x0}, v32_int8 = {0x50, 0x4d, 0x6d, 0x0, 0x62, 0x75, 0x66, 0x66, 0x65, 0x72, 0x20, 0x73, 0x69, 0x7a, 0x65, 0x20, 0x0 <repeats 16 times>}, v16_int16 = {0x4d50, 0x6d, 0x7562, 0x6666, 0x7265, 0x7320, 0x7a69, 0x2065, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v8_int32 = {0x6d4d50, 0x66667562, 0x73207265, 0x20657a69, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x66667562006d4d50, 0x20657a6973207265, 0x0, 0x0}, v2_int128 = {0x20657a697320726566667562006d4d50, 0x0}}
ymm9 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm10 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm11 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm12 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm13 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 t--Type <RET> for more, q to quit, c to continue without paging--
imes>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm14 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
ymm15 {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0 <repeats 32 times>}, v16_int16 = {0x0 <repeats 16 times>}, v8_int32 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0x0, 0x0, 0x0, 0x0}, v2_int128 = {0x0, 0x0}}
bnd0 {lbound = 0x0, ubound = 0xffffffffffffffff} {lbound = 0x0, ubound = 0xffffffffffffffff}
bnd1 {lbound = 0x0, ubound = 0xffffffffffffffff} {lbound = 0x0, ubound = 0xffffffffffffffff}
bnd2 {lbound = 0x0, ubound = 0xffffffffffffffff} {lbound = 0x0, ubound = 0xffffffffffffffff}
bnd3 {lbound = 0x0, ubound = 0xffffffffffffffff} {lbound = 0x0, ubound = 0xffffffffffffffff}
Wow, that is definitely not what I was expecting!
At face value that seems to indicate SIGILL is occurring inside of memcpy()
as opposed to anywhere in the ECC arithmetic.
It's possible that something like auto-vectorization is using an unsupported instruction beyond what is being detected at runtime. It might have something to do with the CFLAGS:
https://github.com/RubyCrypto/x25519/blob/ba9e0c2/ext/x25519_precomputed/extconf.rb#L7
$CFLAGS << " -Wall -O3 -pedantic -std=c99 -mbmi -mbmi2 -march=native -mtune=native"
Those should probably be changed to match what is being autodetected at runtime, something like -march=haswell
.
You could try editing that line to be something like:
$CFLAGS << " -Wall -O3 -pedantic -std=c99 -mbmi -mbmi2"
...and see if that fixes the problem.
Ok after some tests :
$CFLAGS << " -Wall -O3 -pedantic -std=c99 -mbmi -mbmi2 -march=native -mtune=native"
doesn't work.
But all of this settings work :
$CFLAGS << " -Wall -O3 -pedantic -std=c99 -mbmi -mbmi2 -march=native -mtune=native -mno-avx512f"
$CFLAGS << " -Wall -O3 -pedantic -std=c99 -mbmi -mbmi2"
$CFLAGS << " -Wall -O3 -pedantic -std=c99 -mbmi -mbmi2 -march=haswell -mtune=native"
I think we have an issue with avx-512f.
Yeah, seems like it's been narrowed down to auto-vectorization of memcpy()
using AVX-512, when the CPU feature detection is only checking for AVX2.
I'll open a PR to change the flags to ones that the runtime CPU feature detection ensures are available.
Thanks for the help ! I was interesting for me :)
The fix should now be available in v1.0.9
Thanks, sorry I've been buried in other stuff and haven't been able to help at all.
This is happening to us a bunch in CI on buidkite on centos-8, oracle-7, ubuntu-20.04, ubuntu-18.04 (but not ubuntu-21.04), debian-10, debian-11 (but not debian-9). I suspect it randomly has to do with the worker the job gets assigned to and the chipset being used on the worker. Since it is in buildkite though on CI and I don't have a local replication its difficult to attach gdb or anything like that.
It is also mutiple layers of tooling deep in our CI jobs framework. It looks like ultimately one of our gems requires net-ssh, when then requires and finds x25519 and tries to use it since its in the bundle.
The vendor/bundle is buildkite's gem caching mechanism. A possible thought is that the gem could have been installed+compiled on a later processor architecture randomly and is getting deployed via the cache onto an older one?