Open vsivsi opened 4 years ago
Strange.
I get the same behavior from C.
main.c:
void foo();
int main(int argc, char *argv[]) {
foo();
}
main.s:
.globl _foo
_foo:
vpopcntw %zmm1, %zmm0
ret
This program also hangs. Compile with gcc main.c main.s
, run with ./a.out
.
So I think this is an OSX bug, not a Go bug.
Nothing obvious when run under a debugger. The debugger runs it forever, and every time I interrupt it it is at the vpopcntw
instruction.
The same C code generates an illegal instruction fault on Linux, so chances are it isn't the chip (although my mac and linux boxes aren't exactly the same chip.)
The Darwin kernel has a semi-spooky 2-tier AVX512 thread "promotion" mechanism that involves trapping AVX512 instruction faults, changing thread status to support AVX512, and then rerunning the offending instruction. In theory this scheme should only happen once per process thread upon encountering the first AVX512 instruction. The purpose is to avoid the large additional thread state required for AVX512 (around 2KB) when it is not needed. I would assume that it would only try this promotion procedure once per thread, such that if the AVX512 instruction causing the fault still isn't supported after enabling AVX512 in the thread state, that fault should revert to the process. But I'm way out over my skis on this kind of stuff... Here's the Darwin reference:
I've submitted a bug to Apple, reference number FB8902463. Their bug reporting tool isn't really public, so I'll report back here if they say anything (which they usually don't, they just silently ignore them).
Related to this issue, it appears that on MacOS, the golang.org/x/sys/cpu
package does not properly recognize Macs that support AVX512 instructions, due to the Darwin "AVX512 thread promotion" mechanism I mentioned above. Specifically this code incorrectly assumes that OS disabled XSAVE AVX512 thread state can't be changed.
https://github.com/golang/sys/blob/master/cpu/cpu_x86.go#L90
I'm working on a separate issue for this that I'll link here as well.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?MacOS 10.15.7
go env
OutputWhat did you do?
Attempt to use Intel AVX-512
VPOPCNT
family AVX-512 instructions in go assembler.What did you expect to see?
Assembly code using these instruction should run properly on processors supporting them, and should generate a UD fault (SIGILL) and terminate when invoked on a CPU without support.
What did you see instead?
Go runtime hangs forever with 100% CPU utilization upon executing a VPOPCNT(B/W/D/Q) instruction on hardware that doesn't support it. Tested running on a MacPro (2019) with 2.7 GHz 24-Core Intel Xeon W CPU (Xeon W-3265M)
Note, this processor does not include AVX512_BITALG or AVX512_VPOPCNTDQ, which are required for VPOPCNT(B/W) and VPOPCNT(D/Q) respectively. For a summary of the VPOPCNT support matrix, see: https://github.com/HJLebbink/asm-dude/wiki/VPOPCNT
The Intel processor documentation says that attempting to run such AVX512 instructions when the supporting feature CPUID flags are not set should result in raising a #UD exception. As expected, directly executing the amd64
UD2
instruction causes the go runtime to abort withSIGILL: illegal instruction
. But when unsupported, these AVX512 instructions cause the runtime to hang in a tight loop of some kind, which doesn't seem to be consistent or correct behavior.Here is a dump from a process sample of the hung go runtime process resulting from the repro below.
/usr/bin/sample
OutputMinimal Reproduction
The code below when compiled with
go build
and then executed, should immediately hang with 100% processor utilization on a single process thread, when run on a CPU missing either AVX512_BITALG or AVX512_VPOPCNTDQ CPUID feature flags (which I believe at the time of this writing is all Apple Macs Edited to Add: except the top of the line 10th Gen Core (Ice Lake) powered 13" Macbook Pros).main.go
popcnt_amd64.s