Closed tarcieri closed 2 years ago
It looks like an i686-specific issue. cpufeatures v0.2.4
should be identical to v0.2.3. so a more likely PR to look into is https://github.com/RustCrypto/utils/pull/792. Also the master branch CI fails as well.
Interestingly enough, the 1.56 job passes without issues, so it could be a compiler bug.
I tested it a bit on my PC. The error happens spuriously, approximately every 4th run. It does not get triggered if zeroize
feature is not enabled, as well as reverting cpufeatures
to v0.2.2.
Huh, that's strange regarding zeroize
. I believe there's only a single usage in the entire crate:
https://github.com/RustCrypto/stream-ciphers/blob/400a398/chacha20/src/lib.rs#L310
...where state
is a [u32; 16]
.
@newpavlov since you can reproduce it, could you attach gdb
or lldb
and get a backtrace?
I'll see if I can poke at it if I have some time.
I was unable to trigger the error in gdb
and I can not attach gdb
to already launched tests since they end too quickly.
Maybe it's worth to release a temporary hotfix which would pin cpufeatures
to v0.2.2 on i686?
For what it's worth, I got it to segfault in GDB after a bunch of tries (it's way rarer than 1 in 4 on my machine; the text scrolled too fast to count the tries).
I've modified the test only insofar that I manually expanded the macro. The segfault occurs on the same line with the original test code.
Kernel: Linux 5.10.16.3-microsoft-standard-WSL2
So the line causing the segfault is
let mut buf = [0u8; MAX_SEEK];
And sure enough, after compiling it with cpufeatures v0.2.2
or with zeroize
disabled, the segfault disappeared.
Hm, interesting. I think it makes the compiler bug hypothesis more probable.
Running the test with coredumps enabled (ulimit -c unlimited
, where they end up depends on /proc/sys/kernel/core_pattern) is the easy way of getting a gdb session with the crash. This shows the crash as being in the plt jmp for memset, which is spicy:
(gdb) disas
Dump of assembler code for function memset@plt:
=> 0x565b1200 <+0>: jmp *0x80(%ebx)
0x565b1206 <+6>: push $0xe8
0x565b120b <+11>: jmp 0x565b1020
(gdb) print $ebx
$2 = 0
[in the calling function _ZN4core3ops8function6FnOnce9call_once17hc696d536ea7e3abaE]
0x565babc2 <+1618>: sub $0x4,%esp
0x565babc5 <+1621>: mov 0x8(%esp),%ebx
0x565babc9 <+1625>: push $0x200
0x565babce <+1630>: push $0x0
0x565babd0 <+1632>: lea 0xfc(%esp),%eax
0x565babd7 <+1639>: push %eax
0x565babd8 <+1640>: call 0x565b1200 <memset@plt>
=> 0x565babdd <+1645>: add $0x10,%esp
(gdb) x/20 $esp
0xf79ec91c: 0x565babdd 0xf79eca20 0x00000000 0x00000200
0xf79ec92c: 0x565ba57f 0x00000000 0x00000000 0x00000000
This one right here was put in %ebx --^
Looking at the rest of the function I am extremely suspicious of the cpuid
calls that might clobber clang's/i686's use of %ebx
. This is the stdlib function, and it has a somewhat subtle error, so moving to the stdlib issue.
See: https://github.com/RustCrypto/stream-ciphers/runs/7970722385?check_suite_focus=true#step:8:28
It occurred running the integration tests in the
autodetect
CI job for this dependabot PR to bumpcpufeatures
from 0.2.3 to 0.2.4, so that's possibly implicated: https://github.com/RustCrypto/stream-ciphers/pull/303cc @newpavlov @str4d