archspec / archspec-json

Other
20 stars 34 forks source link

Icelake: investigate absence of 'clwb' instruction on some platforms #15

Closed alalazo closed 3 years ago

alalazo commented 4 years ago

See https://github.com/spack/spack/pull/18151 and #12 Some processors, whose microarchitecture should be icelake, don't exhibit the clwb instruction in /proc/cpuinfo - while that instruction seems to be required.

alalazo commented 4 years ago

@boegel

tgamblin commented 4 years ago

@alalazo the question is really whether the compiler is allowed to emit that instruction when compiling with the flags we give it for icelake. Basically, if archspec says something is icelake and has certain unique instructions used to detect that architecture, we should be sure that if we detect it, we never emit more instructions than we expect.

For clwb, it looks like the instruction was introduced in Skylake server and Icelake client:

And it says in the Cascade Lake docs seem to imply that, even though these instructions were introduced in Skylake, Skylake, they're actually used in Cascade Lake:

CLWB is part of Intel's PMEM extensions. I don't know for sure, but I don't think any current compilers emit these instructions automatically when you tell them to generate code for icelake or cascadelake -- @jeffhammond tells me these are just for intrinsics and libraries, so I think we are safe if we detect icelake based on other features -- compilers won't actually generate clwb if told to build for icelake.

There's another question, though: should a user be able to see that a machine is icelake and use clwb via intrinsics (basically, what's the currently undocumented contract we're providing?). I think the answer is that they should be able to, and we can't do much if it's locally disabled on the machine.

This is tricky because we detect based on features, and if we take the instruction out, then asking whether icelake supports clwb will currently come back with False. We mostly care about vector instructions here so I don't htink this is a big deal for most users, but it would be nice to have a sensible way to handle cases where the OS has disabled some instruction. If we go down this path, it seems like we will need to document some features used for detection, some for queries, and issue warnings or something if someone asks for a disabled instruction -- basically say the arch supports it but they shouldn't try to build for it on this machine if they really need it.

Thoughts?

alalazo commented 4 years ago

so I think we are safe if we detect icelake based on other features -- compilers won't actually generate clwb if told to build for icelake.

I didn't understand it this way from reading GCC man pages:

Intel Icelake Client CPU with 64-bit extensions, MOVBE, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, PKU, AVX, AVX2,
    AES, PCLMUL, FSGSBASE, RDRND, FMA, BMI, BMI2, F16C, RDSEED, ADCX, PREFETCHW, CLFLUSHOPT, XSAVEC, XSAVES, AVX512F, AVX512VL,
    AVX512BW, AVX512DQ, AVX512CD, AVX512VBMI, AVX512IFMA, SHA, CLWB, UMIP, RDPID, GFNI, AVX512VBMI2, AVX512VPOPCNTDQ, AVX512BITALG,
    AVX512VNNI, VPCLMULQDQ, VAES instruction set support.

and seeing that there's a corresponding -mclwb flag that enables the use of the instruction. My naive feeling is that GCC is allowed to emit that instruction if we optimize for icelake-client.

If we go down this path [ ... ] Thoughts?

I wonder if we want to maintain information on the flags to enable/disable single instructions and be able to optimize for icelake~clwb which means using -march=icelake-client -mtune=icelake-client -mno-clwb. Managing DAGs and such might get complicated soon though if we go this route.

alalazo commented 3 years ago

I think this can be closed. Now that we have "levels" for generic x86_64 microarchitectures we'll fall back to the best generic, instead of the first uarch that didn't need the missing instruction.