Closed rrnewton closed 6 months ago
Just a note when I was investigating this:
Seems like AMD hosts do not have this issue because they bypass CPUID
interception:
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 166
On-line CPU(s) list: 0-165
Vendor ID: AuthenticAMD
Model name: AMD EPYC-Milan Processor
$ ./hermit run date
2024-04-04T17:55:26.176575Z WARN reverie_ptrace::perf: Pmu bugs detected: AmdSpecLockMapShouldBeDisabled
2024-04-04T17:55:26.177386Z WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.178854Z WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.199113Z WARN reverie_ptrace::perf: Pmu bugs detected: AmdSpecLockMapShouldBeDisabled
2024-04-04T17:55:26.200272Z WARN reverie_ptrace::perf: Pmu bugs detected: AmdSpecLockMapShouldBeDisabled
2024-04-04T17:55:26.204414Z WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.209334Z WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.245555Z WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
Fri Dec 31 03:59:59 PM PST 2021
2024-04-04T17:55:26.283112Z WARN detcore::scheduler: Nondeterministic external actions [DetPid(7)] jumped in the middle of runnable work (2 tasks). Need to record this for reproducibility.
An Intel host, however, seems to intercept the cpuid properly and runs into the issue as described:
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 72
On-line CPU(s) list: 0-71
Vendor ID: GenuineIntel
Model name: Intel Core Processor (Broadwell)
$ ./hermit run date
Fatal glibc error: CPU does not support x86-64-v2
If I run --no-virtualize-cpuid
on the Intel host, I can confirm that it works:
$ ./hermit run --no-virtualize-cpuid date
Fri Dec 31 03:59:59 PM PST 2021
2024-04-04T17:58:23.540958Z WARN detcore::scheduler: Nondeterministic external actions [DetPid(7)] jumped in the middle of runnable work (3 tasks). Need to record this for reproducibility.
This makes me think that the CPUID bits in cpuid.rs are not toggled properly. It could, however, also be something related to intercept_cpuid maybe
I think it's just an issue with what CPUID flags we expose. According to https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels, we need to enable flags for SSE3
and SSE4
, among other things.
Using this page to find out the appropriate bits to toggle (https://www.felixcloutier.com/x86/cpuid), I was able to check this to work on an Intel host.
I can make hermit run with this change:
$ hg diff
diff --git a/hermetic_infra/hermit/detcore/src/cpuid.rs b/hermetic_infra/hermit/detcore/src/cpuid.rs
--- a/hermetic_infra/hermit/detcore/src/cpuid.rs
+++ b/hermetic_infra/hermit/detcore/src/cpuid.rs
@@ -40,7 +40,12 @@
// masked off to prevent non-determinism.
const CPUIDS: &[CpuIdResult] = &[
cpuid_result(0x0000000D, 0x756E6547, 0x6C65746E, 0x49656E69),
- cpuid_result(0x00000663, 0x00000800, 0x90202001, 0x078BFBFD),
+ cpuid_result(
+ 0x00000663,
+ 0x00000800,
+ 0x90202001 | (1 << 0) | (1 << 9) | (1 << 13) | (1 << 19) | (1 << 20) | (1 << 23),
+ 0x078BFBFD,
+ ),
cpuid_result(0x00000001, 0x00000000, 0x0000004D, 0x002C307D),
cpuid_result(0x00000000, 0x00000000, 0x00000000, 0x00000000),
cpuid_result(0x00000120, 0x01C0003F, 0x0000003F, 0x00000001),
$ buck2 run //hermetic_infra/hermit/hermit-cli:hermit -- run date
...
BUILD SUCCEEDED
Fri Dec 31 03:59:59 PM PST 2021
2024-04-08T15:23:28.398395Z WARN detcore::scheduler: Nondeterministic external actions [DetPid(7)] jumped in the middle of runnable work (3 tasks). Need to record this for reproducibility.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 72
On-line CPU(s) list: 0-71
Vendor ID: GenuineIntel
Model name: Intel Core Processor (Broadwell)
...
Describe the bug
On some OS/arch/glibc, we now fail quickly with this message:
To Reproduce
Running
examples/rand.py
is sufficient.Environment
Initial Investigation Notes
Running with
hermit run --log=info
shows that this is an issue with arch_prctl returning EINVAL: