facebookexperimental / hermit

Hermit launches linux x86_64 programs in a special, hermetically isolated sandbox to control their execution. Hermit translates normal, nondeterministic behavior, into deterministic, repeatable behavior. This can be used for various applications, including replay-debugging, reproducible artifacts, chaos mode concurrency testing and bug analysis.
Other
1.19k stars 31 forks source link

Fails with CPU does not support x86-64-v2 #49

Closed rrnewton closed 6 months ago

rrnewton commented 6 months ago

Describe the bug

On some OS/arch/glibc, we now fail quickly with this message:

Fatal glibc error: CPU does not support x86-64-v2

To Reproduce

Running examples/rand.py is sufficient.

Environment

Initial Investigation Notes

Running with hermit run --log=info shows that this is an issue with arch_prctl returning EINVAL:

2024-04-04T15:56:54.694800Z  INFO detcore: DETLOG [syscall][detcore, dtid 3] finish syscall #3: arch_prctl(12289, 0x7fffffffd340) = Err(Errno(EINVAL))
CookieComputing commented 6 months ago

Just a note when I was investigating this:

Seems like AMD hosts do not have this issue because they bypass CPUID interception:

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  166
  On-line CPU(s) list:   0-165
Vendor ID:               AuthenticAMD
  Model name:            AMD EPYC-Milan Processor
$ ./hermit run date
2024-04-04T17:55:26.176575Z  WARN reverie_ptrace::perf: Pmu bugs detected: AmdSpecLockMapShouldBeDisabled
2024-04-04T17:55:26.177386Z  WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.178854Z  WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.199113Z  WARN reverie_ptrace::perf: Pmu bugs detected: AmdSpecLockMapShouldBeDisabled
2024-04-04T17:55:26.200272Z  WARN reverie_ptrace::perf: Pmu bugs detected: AmdSpecLockMapShouldBeDisabled
2024-04-04T17:55:26.204414Z  WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.209334Z  WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
2024-04-04T17:55:26.245555Z  WARN reverie_ptrace::task: Unable to intercept CPUID: Underlying hardware does not support CPUID faulting
Fri Dec 31 03:59:59 PM PST 2021
2024-04-04T17:55:26.283112Z  WARN detcore::scheduler: Nondeterministic external actions [DetPid(7)] jumped in the middle of runnable work (2 tasks). Need to record this for reproducibility.

An Intel host, however, seems to intercept the cpuid properly and runs into the issue as described:

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  72
  On-line CPU(s) list:   0-71
Vendor ID:               GenuineIntel
  Model name:            Intel Core Processor (Broadwell)
$ ./hermit run date
Fatal glibc error: CPU does not support x86-64-v2

If I run --no-virtualize-cpuid on the Intel host, I can confirm that it works:

$ ./hermit run --no-virtualize-cpuid date
Fri Dec 31 03:59:59 PM PST 2021
2024-04-04T17:58:23.540958Z  WARN detcore::scheduler: Nondeterministic external actions [DetPid(7)] jumped in the middle of runnable work (3 tasks). Need to record this for reproducibility.

This makes me think that the CPUID bits in cpuid.rs are not toggled properly. It could, however, also be something related to intercept_cpuid maybe

CookieComputing commented 6 months ago

I think it's just an issue with what CPUID flags we expose. According to https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels, we need to enable flags for SSE3 and SSE4, among other things.

Using this page to find out the appropriate bits to toggle (https://www.felixcloutier.com/x86/cpuid), I was able to check this to work on an Intel host.

I can make hermit run with this change:

$ hg diff
diff --git a/hermetic_infra/hermit/detcore/src/cpuid.rs b/hermetic_infra/hermit/detcore/src/cpuid.rs
--- a/hermetic_infra/hermit/detcore/src/cpuid.rs
+++ b/hermetic_infra/hermit/detcore/src/cpuid.rs
@@ -40,7 +40,12 @@
 // masked off to prevent non-determinism.
 const CPUIDS: &[CpuIdResult] = &[
     cpuid_result(0x0000000D, 0x756E6547, 0x6C65746E, 0x49656E69),
-    cpuid_result(0x00000663, 0x00000800, 0x90202001, 0x078BFBFD),
+    cpuid_result(
+        0x00000663,
+        0x00000800,
+        0x90202001 | (1 << 0) | (1 << 9) | (1 << 13) | (1 << 19) | (1 << 20) | (1 << 23),
+        0x078BFBFD,
+    ),
     cpuid_result(0x00000001, 0x00000000, 0x0000004D, 0x002C307D),
     cpuid_result(0x00000000, 0x00000000, 0x00000000, 0x00000000),
     cpuid_result(0x00000120, 0x01C0003F, 0x0000003F, 0x00000001),

$ buck2 run //hermetic_infra/hermit/hermit-cli:hermit -- run date
...
BUILD SUCCEEDED
Fri Dec 31 03:59:59 PM PST 2021
2024-04-08T15:23:28.398395Z  WARN detcore::scheduler: Nondeterministic external actions [DetPid(7)] jumped in the middle of runnable work (3 tasks). Need to record this for reproducibility.

$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         40 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  72
  On-line CPU(s) list:   0-71
Vendor ID:               GenuineIntel
  Model name:            Intel Core Processor (Broadwell)
...
CookieComputing commented 6 months ago

https://github.com/facebookexperimental/hermit/commit/bd3153b4bd311831b33571d523228b2d16ff039a should close this