golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.65k stars 17.41k forks source link

runtime: AVX512 register state clobbered by signal on macOS #49233

Open vsivsi opened 2 years ago

vsivsi commented 2 years ago

What version of Go are you using (go version)?

% go version
go version go1.17.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
 % go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/vsi/Library/Caches/go-build"
GOENV="/Users/vsi/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/vsi/go/pkg/mod"
GONOPROXY="github.com/vsivsi"
GONOSUMDB="github.com/vsivsi"
GOOS="darwin"
GOPATH="/Users/vsi/go"
GOPRIVATE="github.com/vsivsi"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.17.2/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.17.2/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.2"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/kp/kjdr0ytx5z9djnq4ysl15x0h0000gn/T/go-build4285878374=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

In writing AVX-512 assembly functions using the Go assembler (via Avo), I began to notice unexplained intermittent test failures. A lot of painful investigation revealed that golang async preemption does not seem to be properly save/restoring AVX512 state when assembly functions are interrupted. From visual inspection of the runtime/preempt_amd64.s source, it also seems likely that AVX/AVX2 state (YMM upper 128 bits) may not be properly save/restored either, but I have not encountered/tested that case in my own code.

A complete minimal reproduction illustrating the clobbering of the AVX512 K1 opmask register is here: https://gist.github.com/vsivsi/fff8618ace4b02eb410dd8792779bf32

Note, in my testing running with GODEBUG=asyncpreemptoff=1 rescues every failure case I have identified.

What did you expect to see?

I expected all relevant processor state to be properly restored in an assembly function following an async preemption.

What did you see instead?

Critical state is being clobbered, leading to intermittent undefined behavior.

dmitshur commented 2 years ago

CC @ianlancetaylor, @prattmic, @FiloSottile.

randall77 commented 2 years ago

Hmm, this may be the VZEROUPPER we introduced in CL 219131 to fix #37174. Or it may be worse than that, I'm not sure. @cherrymui

@vsivsi Can you try patching out #ifdef GOOS_darwin clause in runtime/preempt_amd64.s and see if that helps?

cherrymui commented 2 years ago

The runtime doesn't async preempt assembly functions. asyncPreempt (the function in runtime/preempt_amd64.s) should not be executed if it is running in an assembly function. The runtime sends a signal, but if the signal lands in an assembly function the signal handler will return immediately without actually preempt.

I suspect that the darwin kernel doesn't preserve AVX512 state when handling signals. Then the problem would not be preemption, but any asynchronous signal could cause it to fail, including profiling signals or user signals.

What version of macOS are you running on?

I have a vague memory that Apple suggests that before using AVX512 one must check its availability with some sysctl (but I'm not really sure). Do we do that?

vsivsi commented 2 years ago

@randall77 I can try that, but the initial issue I encountered (as reflected in the repro case) was clobbering of the K0-K7 opmask registers, which are distinct from the vector registers affected by VZEROUPPER, etc.

@cherrymui Yes, Darwin requires special checking to identify AVX512 support. I submitted a fix for this last year. See here for details on that saga: https://github.com/golang/go/issues/43089

It is quite possible that this issue I'm seeing is new with golang 1.17. Is it possible that the new assembly adapter funcs required for inter-operation with the new function parameter passing spec could be interfering with the assembly code detection during preemption?

To maintain compatibility with existing assembly functions, the compiler generates adapter functions that convert between the new register-based calling convention and the previous stack-based calling convention. These adapters are typically invisible to users, except that taking the address of a Go function in assembly code or taking the address of an assembly function in Go code using reflect.ValueOf(fn).Pointer() or unsafe.Pointer will now return the address of the adapter. Code that depends on the value of these code pointers may no longer behave as expected. Adapters also may cause a very small performance overhead in two cases: calling an assembly function indirectly from Go via a func value, and calling Go functions from assembly.

josharian commented 2 years ago

How we detect asm routines during preempt changed in Go 1.18. (https://github.com/golang/go/commit/c2483a5c034152fcdfbb2e6dbcf48b0103d8db6a#diff-47fd68949147d260b91998c0d6eabffcdb74c58991fc386b40e14a6ed710b17d)

Does this reproduce at tip? (I'm AFK.)

leitzler commented 2 years ago

It does, with:

% go version
go version devel go1.18-6113dacf32 Sat Oct 30 18:30:34 2021 +0000 darwin/amd64
% sysctl machdep.cpu.brand_string
machdep.cpu.brand_string: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
% sw_vers
ProductName:    macOS
ProductVersion: 11.6
BuildVersion:   20G165
``` % go test . --- FAIL: TestMask (0.43s) maskcheck_test.go:16: Failed for iteration: 0 with return value: 88007197 maskcheck_test.go:16: Failed for iteration: 1 with return value: 61939268 maskcheck_test.go:16: Failed for iteration: 2 with return value: 50093958 maskcheck_test.go:16: Failed for iteration: 3 with return value: 57687622 maskcheck_test.go:16: Failed for iteration: 4 with return value: 55302035 maskcheck_test.go:16: Failed for iteration: 5 with return value: 50302791 maskcheck_test.go:16: Failed for iteration: 6 with return value: 55661062 maskcheck_test.go:16: Failed for iteration: 7 with return value: 54177345 maskcheck_test.go:16: Failed for iteration: 8 with return value: 58187395 maskcheck_test.go:16: Failed for iteration: 9 with return value: 51991146 maskcheck_test.go:16: Failed for iteration: 10 with return value: 56438664 maskcheck_test.go:16: Failed for iteration: 11 with return value: 54676159 maskcheck_test.go:16: Failed for iteration: 12 with return value: 50955817 maskcheck_test.go:16: Failed for iteration: 13 with return value: 52912278 maskcheck_test.go:16: Failed for iteration: 14 with return value: 57776646 maskcheck_test.go:16: Failed for iteration: 15 with return value: 54628233 maskcheck_test.go:16: Failed for iteration: 16 with return value: 55309645 maskcheck_test.go:16: Failed for iteration: 17 with return value: 52719044 maskcheck_test.go:16: Failed for iteration: 18 with return value: 57165517 maskcheck_test.go:16: Failed for iteration: 19 with return value: 57336883 FAIL FAIL x 0.570s FAIL ```
cherrymui commented 2 years ago

It is quite possible that this issue I'm seeing is new with golang 1.17. Is it possible that the new assembly adapter funcs required for inter-operation with the new function parameter passing spec could be interfering with the assembly code detection during preemption?

It seems unlikely. You can try building with Go 1.17 with GOEXPERIMENT=noregabi environment variable set (at go run/build/test time).

Also, you can try running the program with preemption disabled but CPU profiling enabled and see if profiling signal changes anything. You can also try running the program with preemption disabled while having a separate process sending SIGURG signals to it. Thanks.

vsivsi commented 2 years ago

@cherrymui Good suggestions. Running the repro test case as:

 % GODEBUG=asyncpreemptoff=1 go test -cpuprofile cpu.out -count 1 -timeout 15m -run '^TestMask$' gist.github.com/vsivsi/fff8618ace4b02eb410dd8792779bf32
--- FAIL: TestMask (0.26s)
    maskcheck_test.go:21: Failed for iteration: 0 with return value: 47429074
    maskcheck_test.go:21: Failed for iteration: 1 with return value: 45358536
    maskcheck_test.go:21: Failed for iteration: 2 with return value: 52675366
    maskcheck_test.go:21: Failed for iteration: 3 with return value: 44456167
    maskcheck_test.go:21: Failed for iteration: 4 with return value: 72420817
    maskcheck_test.go:21: Failed for iteration: 5 with return value: 72654067
    maskcheck_test.go:21: Failed for iteration: 6 with return value: 72091434
    maskcheck_test.go:21: Failed for iteration: 7 with return value: 73369163
    maskcheck_test.go:21: Failed for iteration: 8 with return value: 72828760
    maskcheck_test.go:21: Failed for iteration: 9 with return value: 73065114
    maskcheck_test.go:21: Failed for iteration: 10 with return value: 73038149
    maskcheck_test.go:21: Failed for iteration: 11 with return value: 72684990
    maskcheck_test.go:21: Failed for iteration: 12 with return value: 72653599
    maskcheck_test.go:21: Failed for iteration: 13 with return value: 72440260
    maskcheck_test.go:21: Failed for iteration: 14 with return value: 72201890
    maskcheck_test.go:21: Failed for iteration: 15 with return value: 45021003
    maskcheck_test.go:21: Failed for iteration: 16 with return value: 72012922
    maskcheck_test.go:21: Failed for iteration: 17 with return value: 44692569
    maskcheck_test.go:21: Failed for iteration: 18 with return value: 72992403
    maskcheck_test.go:21: Failed for iteration: 19 with return value: 73178568
FAIL
FAIL    gist.github.com/vsivsi/fff8618ace4b02eb410dd8792779bf32 0.601s
FAIL

This leads to the test failing, and with different looking stats (each iteration fails earlier in the loop on average, probably due to different signaling rates between profiling and preemption).

Two other data points:

  1. I have not yet observed corruption of actual ZMM vector state in my testing. From what I've seen this appears to be limited to the K0-K7 opmask registers.
  2. The above observation is curious in light of the fact that the Darwin kernel code I've been looking at is at least not obviously broken w.r.t to handling the opmask registers and AVX512 thread state in signal handling. Opmask data is present in all of the relevant state structures, there appear to be tests that check that it is properly saved and restored, etc. So whatever is happening is more subtle than mere omission of this detail in the kernel. See e.g.: https://github.com/apple/darwin-xnu/search?q=opmask
cherrymui commented 2 years ago

@vsivsi thanks for the repro! So it seems it is not related preemption but signals in general. The observation about K0-K7 also suggests VZEROUPPER is not the cause.

Would it be possible to write a similar test in C and assembly (GNU syntax) and see if it is failing?

Maybe we need a different CPU feature probe to tell the kernel we're using K0-K7? (I haven't looked at the code carefully. Just a guess.)

vsivsi commented 2 years ago

@cherrymui The mechanism for using AVX-512 in Darwin doesn't involve telling the kernel which features to enable. The kernel simply advertises the available AVX-512 features in the process commpage data area. Once you've verified that a given required set of AVX-512 features is available, you simply use them. Darwin catches the UD interrupt for any AVX-512 instruction on first use, and promotes that thread to use the full AVX512 thread state, enabling every available feature.

Specifically, the three XCR0[5:7] bits (from XGETBV) indicate whether XSAVE will preserve:

Upon thread promotion following the UD interrupt from a supported AVX512 instruction, Darwin sets XCR0[5:7] = 111b, for the thread, enabling the full "AVX512 state" in XSAVE.

vsivsi commented 2 years ago

The kernel definitions of these XCR0 bits are here: https://github.com/apple/darwin-xnu/blob/8f02f2a044b9bb1ad951987ef5bab20ec9486310/osfmk/i386/proc_reg.h#L168-L188

The full AVX512 enabled mask is composed here: https://github.com/apple/darwin-xnu/blob/8f02f2a044b9bb1ad951987ef5bab20ec9486310/osfmk/i386/fpu.h#L71

And this is the function that runs when a UD for an AVX512 instruction is trapped, triggering the XSAVE state promotion: https://github.com/apple/darwin-xnu/blob/8f02f2a044b9bb1ad951987ef5bab20ec9486310/osfmk/i386/fpu.c#L1472

vsivsi commented 2 years ago

@cherrymui Here is an attempted reproduction of this issue in GNU C and Asm. In my testing it does not fail.

https://gist.github.com/vsivsi/8511aca1bac528f49fbb45a636afa4b5

% gcc testmask.c testmask.s && ./a.out &
[1] 27524
% for ((x=0;x<1000000;x++)); do kill -s URG 27524; done
Test Passed!
[1]  + done       ./a.out
cherrymui commented 2 years ago

Thanks for trying that. In your C code, could you install a signal handler for SIGURG (which could simply return immediately)? Thanks.

vsivsi commented 2 years ago

@cherrymui Gist updated with empty SIGURG handler. Now it fails as with go runtime:

% gcc testmask.c testmask.s && ./a.out &                        
[1] 28786
% for ((x=0;x<20;x++)); do kill -s URG 28786; sleep 0.01; done
Failed for iteration: 133 with return value: 90069936
Failed for iteration: 134 with return value: 79705026
Failed for iteration: 135 with return value: 79757341
Failed for iteration: 136 with return value: 79875856
Failed for iteration: 137 with return value: 82261267
Failed for iteration: 138 with return value: 83127231
Failed for iteration: 139 with return value: 83641375
Failed for iteration: 140 with return value: 83352568
Failed for iteration: 141 with return value: 82555528
Failed for iteration: 142 with return value: 79494222
Failed for iteration: 143 with return value: 78981164
Failed for iteration: 144 with return value: 81261765
Failed for iteration: 145 with return value: 80885813
Failed for iteration: 146 with return value: 79706999
Failed for iteration: 147 with return value: 83441689
Failed for iteration: 148 with return value: 80171870
Failed for iteration: 149 with return value: 79769830
Failed for iteration: 150 with return value: 81361308
Failed for iteration: 151 with return value: 81237987
Failed for iteration: 152 with return value: 79568372
Test failed 20 times!
[1]  + done       ./a.out
randall77 commented 2 years ago

This reminds me of https://github.com/golang/go/issues/37174#issuecomment-584914106 Sounds like Apple needs some more regression tests for their sigaction code path.

vsivsi commented 2 years ago

I'm sure you all are happy that this doesn't appear to be a problem in golang, but it still seems like a problem for golang, no?

cherrymui commented 2 years ago

Thanks @vsivsi ! That indicates the kernel's signal handling code doesn't seem to do the right thing.

vsivsi commented 2 years ago

So what's the next step here? Throwing a process into an indeterminate state via external signal feels like a CVE-level issue.

cherrymui commented 2 years ago

If there is a simple workaround that we could do it in the runtime, we'll do it, like the VZEROUPPER for #37174 . I don't know how to work around this one, though. One probably has to either not use K0-K7 registers or block all signals.

I guess the next step would be reporting to Apple and have them fix the kernel.

vsivsi commented 2 years ago

While they're at it, they should fix this one too: https://github.com/golang/go/issues/42649

Also, not using the K0-K7 opmasks == not using AVX-512 for all practical purposes.

randall77 commented 2 years ago

We need to report a bug to Apple. I can do that - I've done it before, and supposedly a report from Google has a bit more visibility than one from random internet user. Any idea what range of OS versions this causes a problem on? I don't see a OS version in the OP.

vsivsi commented 2 years ago

@randall77 I've seen it on Catalina (10.15.x) and Big Sur (11.x), but I bet it goes all the way back to the first AVX-512 support, which I believe was High Sierra (10.13.x) when the first iMac Pro released with a Xeon supporting AVX-512.

cherrymui commented 2 years ago

As it is not safe to use, maybe we could consider hardwire AVX512 to false on Darwin until the kernel fix?

randall77 commented 2 years ago

We could hardwire internal/cpu (and x/sys/cpu) flags to false for avx512. But if the asm uses cpuid itself to figure out what is available, I don't think we'd be able to prevent avx512 use in that case.

vsivsi commented 2 years ago

The only influence golang has over this is the value set for x/sys/cpu.X86.HasAVX512 which anyone is free to ignore (at their potential peril). But temporarily disabling that won't solve the problem of existing go binaries, or running on unpatched Darwin after Apple releases a fix. The robust solution would be to actually test for the live vuln when setting x/sys/cpu.X86.HasAVX512 at runtime.

randall77 commented 2 years ago

Here's a simplified reproduction, just run it and see if it prints anything:

testmask.c

#include <stdio.h>
#include <inttypes.h>
#include <signal.h>
#include <unistd.h>
#include <pthread.h>

volatile int8_t done = 0;

// in testmask.s
uint32_t masktest(uint32_t x);

void sig_handler(int signum) {
  // Tell the assembly loop to return.
  done = 1;
}

void *tester(void* unused) {
  uint32_t x = 0xabcd1234;
  uint32_t y = masktest(x);
  if (x != y) {
    printf("bad: %x -> %x\n", x, y);
  }
  return NULL;
}

int main(int argc, char *argv[]) {
  // Set up simple signal handler.
  signal(SIGURG, sig_handler);

  // Start worker.
  pthread_t worker;
  pthread_create(&worker, NULL, tester, NULL);

  // Wait until the worker is in its assembly loop.
  usleep(1000);

  // Send a signal to the worker.
  pthread_kill(worker, SIGURG);

  // Wait until worker is done.
  pthread_join(worker, NULL);
}

testmask.s

    .globl _done
        .globl _masktest
_masktest:
    // Put a value in the k1 avx512 register.
    kmovd   %edi, %k1

    // Wait until we're told to return.
loop:   
    cmpb    $0, _done(%rip)
    je  loop

    // Return the value in the k1 register.
    kmovd   %k1, %eax
    ret

Build with gcc testmask.c testmask.s and run with ./a.out.

randall77 commented 2 years ago

Mac OSX bug report here: https://feedbackassistant.apple.com/feedback/9736652 (If you can see it, not sure how they deal with permissions.)

randall77 commented 2 years ago

There doesn't seem to be an easy way to let anyone else see my bug report. So here's a screenshot of it: feedbackassistant apple com_feedback_9736652 feedbackassistant apple com_feedback_9736652 (1)

gopherbot commented 2 years ago

Change https://golang.org/cl/361255 mentions this issue: cpu: pretend AVX-512 is disabled on Darwin

vsivsi commented 2 years ago

Posted to Intel software dev forum:

https://community.intel.com/t5/Software-Tuning-Performance/MacOS-Darwin-kernel-bug-clobbers-AVX-512-opmask-register-state/m-p/1327259#M7970

randall77 commented 2 years ago

As far as I can tell, the opmask register state is saved on entry to the signal handler, as the value from the opmask register appears in a dump of the signal context from within the signal handler. The problem must be on the return side.

vsivsi commented 2 years ago

Seemed like a long-shot, but I just re-tested after updating MacOS with this morning's security updates. This issue persists in the latest builds:

  System Version:   macOS 11.6.1 (20G224)
  Kernel Version:   Darwin 20.6.0
System Version: macOS 10.15.7 (19H1519)
Kernel Version: Darwin 19.6.0
vsivsi commented 2 years ago

@randall77 @cherrymui

I'm nearly certain I've found the kernel bug responsible for this issue by inspection.

It looks like it was introduced in Catalina 10.15.6, and has always existed in Big Sur 11.x. I'm presently installing Monterey on one of my MacBook Pros to see what the status there is, since Apple hasn't yet released the Darwin source for MacOS 12.x. Edit to confirm: The issue also exists in Monterey 12.0.1 / Darwin 21.1.0 (21A559).

It seems not to exist in Mojave 10.14.x or before, although that would only impact the iMac Pro, since that was the only pre-10.15 machine with AVX-512.

The issue is a set of changes introduced in 10.15.6 that are aimed at improving performance when AVX-512 is on, but the thread isn't using the ZMM extended state:

This is analogous to your VZEROUPPER optimization for eliminating phantom SSE4 dependencies. But, when the code detects that ZMM_Hi256 state and Hi16_ZMM state are all zeros, it disables XSAVE restore of the "opmask state" as well. This is not a safe assumption, as it is perfectly valid to have non-zero values in K0-K7 while having all zeroed vector registers (as our various reproductions likely illustrate). With AVX-512VL, it is also valid to use EVEX encoded AVX-512 instructions, with opmasks, while only touching the AVX YMM vector state.

My hypothesis above is confirmed by adding a single instruction to the reproduction assembly code sent to Apple:

testmask.s

        .globl _done
        .globl _masktest
_masktest:

        // Storing a non-zero value in ZMM31 suppresses the kernel bug.
        vpbroadcastd  %edi, %zmm31

        // Put a value in the k1 avx512 register.
        kmovd   %edi, %k1

        // Wait until we're told to return.
loop:
        cmpb    $0, _done(%rip)
        je      loop

        // Return the value in the k1 register.
        kmovd   %k1, %eax
        ret

So, here's the offending code in 10.15.6:

if (fpu_allzeroes((uint64_t *)(void *)iavx->x_Hi16_ZMM, 16 * sizeof(_STRUCT_ZMM_REG)) == TRUE &&
    fpu_allzeroes((uint64_t *)(void *)iavx->x_ZMM_Hi256, 16 * sizeof(_STRUCT_YMM_REG)) == TRUE) {
    iavx->_xh.xstate_bv &= ~XFEM_ZMM;
}

From: https://github.com/apple/darwin-xnu/blob/xnu-6153.141.1/osfmk/i386/fpu.c#L954

Where:

#define XFEM_ZMM        (XFEM_ZMM_HI256 | XFEM_HI16_ZMM | XFEM_OPMASK)

From: https://github.com/apple/darwin-xnu/blob/xnu-6153.141.1/osfmk/i386/proc_reg.h#L187

This change was introduced in Darwin xnu-6153.141.1, which corresponds with MacOS10.15.6.

Here's the diff: https://github.com/apple/darwin-xnu/commit/2a1b27122957ed49e0243f383563a369ae3c910c#diff-eebcfce045ea71301ebd7644a71e52b7836189410512149ef8e8e6460a741446R952-R955

So, the fix is to probably replace a single line of code in two locations (for the 32 and 64 bit cases):

    // iavx->_xh.xstate_bv &= ~XFEM_ZMM;
    iavx->_xh.xstate_bv &= ~(XFEM_ZMM_HI256 | XFEM_HI16_ZMM);

Here: https://github.com/apple/darwin-xnu/blob/xnu-6153.141.1/osfmk/i386/fpu.c#L928 and Here: https://github.com/apple/darwin-xnu/blob/xnu-6153.141.1/osfmk/i386/fpu.c#L954

Now we just need to get Apple to do it. In the meantime there's a weak-ish workaround... Ensure that there's always non-zero values in some part of the AVX-512 extended ZMM state.

randall77 commented 2 years ago

Weird. When debugging I thought a use of a zmm register might help, so I added:

    vpxor   %zmm1, %zmm1, %zmm1

and it didn't help. Apparently I needed to set zmm1 to a nonzero value to really "use" that register :(

I'll update my Apple feedback submission.

vsivsi commented 2 years ago

@randall77 Yeah, I went back and checked the original rarely failing unit test case that kicked this all off... It was Avo generated code that only used the YMM registers with opmasks in a 4x64 bit special case. I didn't pick up on the significance of that while performing the debugging that ultimately led to the repro I brought to this issue.

In thinking about the kernel bug presented above overnight, it clarified three things:

On reflection, I'd also like to add that the "one line fix" I presented above may not simultaneously resolve this issue while still satisfying the aims of the Darwin kernel engineers in making the revisions that appeared in Catalina 10.15.6. Another possible fix would be to leave the masking code the same, but add an additional check to ensure that the opmask state also has all zeroed bits. Or perhaps the opmask save/restore should be broken out as a separate case (combining the two approaches).

In any case, I hope Apple carefully audits all uses of XFEM_ZMM (perhaps renaming it in the process to make clear it encompasses the opmask bit), and adds regression testing to ensure this type of confusion doesn't reoccur.

vsivsi commented 2 years ago

Just a quick update on point 3) in the post above. I've run the reproduction C/Asm code here: https://gist.github.com/vsivsi/8511aca1bac528f49fbb45a636afa4b5

As:

for ((x=0;x<250;x++)); do ./a.out & done

When run on a Mac Pro with 48 hyperthreads, each of the 250 processes created above is preemptively multitasked onto ~20% of a hyperthread with a large number of kernel scheduler preemptions per second.

In over an hour of CPU time, I've seen zero opmask corruptions. So I think it's fair to conclude that Darwin's preemptive multitasking does not use the faulty thread state resetting code spotted above.

😅

gopherbot commented 2 years ago

Change https://golang.org/cl/368994 mentions this issue: x/sys/cpu: remove the declaration of function darwinSupportsAVX512

vsivsi commented 2 years ago

Yesterday's security update to MacOS Catalina appears not to have updated the kernel, and my testing indicates this bug remains.

  System Version:   macOS 10.15.7 (19H1615)
  Kernel Version:   Darwin 19.6.0

Likewise, the upgrade to MacOS Monterey 12.1 also does not fix this issue.

  System Version:   macOS 12.1 (21C52)
  Kernel Version:   Darwin 21.2.0

I haven't tested the security update for Big Sur, but it would be very surprising if this had been fixed there and only there.

vsivsi commented 2 years ago

@randall77 @cherrymui

Apple appears to have fixed this kernel bug in Monterey 12.2, but frustratingly did not backport the fix into the corresponding security fixes for Catalina and Big Sur.

So per my testing this morning:

Fixed:

System Version: macOS 12.2 (21D49)
Kernel Version: Darwin 21.3.0

Still Broken:

System Version: macOS 11.6.3 (20G415)
Kernel Version: Darwin 20.6.0
System Version: macOS 10.15.7 (19H1713)
Kernel Version: Darwin 19.6.0

Can someone please verify the above results?

Pending confirmation of the Monterey 12.2 fix, we should put together a patch for x/sys/cpu.X86 to re-enable AVX-512 detection for Darwin 21.3 or later.

vsivsi commented 2 years ago

In preparation for re-enabling AVX-512 for fully patched MacOS >= Monterey 12.2, I've done a little research on the best way to detect the kernel version in Golang on Darwin.

The Darwin kernel version is not present in the process commpage where the AVX-512 capabilities are advertised. So it appears that the path of least resistance is going to be a call to x/sys/unix.Uname() which I have already verified works as expected on Darwin. The Release field of the unix.Utsname struct contains the dotted kernel version number in a \x00 terminated [256]byte. E.g. 21.3.0 followed by 250 \x00s.

This function will return true if the runtime kernel version is >= to the passed major/minor/patch values.

func darwinKernelVersionCheck(major, minor, patch int) bool {
    var un unix.Utsname

    err := unix.Uname(&un)
    if err != nil {
        return false
    }

    ts := bytes.Split(bytes.TrimRight(un.Release[:], "\x00"), []byte{'.'})
    if len(ts) != 3 {
        return false
    }

    var mmp [3]int
    for i := 0; i < 3; i++ {
        tmp, err := strconv.Atoi(string(ts[i]))
        if err != nil {
            return false
        }
        mmp[i] = tmp
    }

    return mmp[0] > major || mmp[0] == major && (mmp[1] > minor || mmp[1] == minor && mmp[2] >= patch)
}

Re-enabling AVX-512 support for fully patched MacOS Monterey machines should simply involve:

Revert all changes in these two revisions:

Add the above function darwinKernelVersionCheck() to x/sys/cpu/cpu_X86.go, and modify the Darwin initialization of osSupportsAVX512 to:

if runtime.GOOS == "darwin" {
    // Check for minimum darwin version and commpage for AVX512 support. Necessary because:
    // https://github.com/apple/darwin-xnu/blob/0a798f6738bc1db01281fc08ae024145e84df927/osfmk/i386/fpu.c#L175-L201
    // See also issue: 49233
    osSupportsAVX512 = osSupportsAVX && darwinSupportsAVX512() && darwinKernelVersionCheck(21, 3, 0)
} else {
// ... non-darwin case

Edited to move the call to darwinKernelVersionCheck(21, 3, 0) to the last position, since it is much more expensive than darwinSupportsAVX512() which will be false on most Macs and short circuit the test, avoiding the final call.

vsivsi commented 2 years ago

BTW, if we prefer a version of darwinKernelVersionCheck() without dependencies on bytes and strconv, the following function is equivalent:

func darwinKernelVersionCheck(major, minor, patch int) bool {

    var un unix.Utsname
    err := unix.Uname(&un)
    if err != nil {
        return false
    }

    var mmp [3]int
    c := 0
Loop:
    for _, b := range un.Release {
        switch {
        case b >= '0' && b <= '9':
            mmp[c] = 10*mmp[c] + int(b-'0')
        case b == '.':
            c++
            if c > 2 {
                return false
            }
        case b == 0:
            break Loop
        default:
            return false
        }
    }
    if c != 2 {
        return false
    }
    return mmp[0] > major || mmp[0] == major && (mmp[1] > minor || mmp[1] == minor && mmp[2] >= patch)
}
vsivsi commented 2 years ago

I've submitted a fix for this issue here: https://go-review.googlesource.com/c/sys/+/382314

gopherbot commented 5 months ago

Change https://go.dev/cl/560955 mentions this issue: runtime: remove VZEROUPPER in asyncPreempt on darwin/amd64