Suggestion: Adding a CPU Database

davschneller commented 1 year ago

Hi everyone, this is more or less meant as a suggestion issue... So: we're getting to a point where we have many CPU architectures, so Yateto is getting cluttered with a lot of case distinctions. Therefore, the following suggestion: we move everything concerning a CPU architecture in Yateto and SeisSol into an extra file—at the caveat of having to parse it during configuration (or maybe generate a CMake file out of it before that). The following suggestion, as an example, we have Haswell here. The respective file would contain more archs in the end.

- name: Haswell
  identifier: [haswell, hsw]
  vendor: intel
  isa: x86-64
    - avx2
    - fma
  vectorsize: 32
  cachelinesize: 64
  flags:
    - gcc:
        - flags: -mavx2 -mfma
    - icc:
        - flags: -xCORE-AVX2 -fma
  gemmgen:
    - libxsmm_jit:
        - arch: hsw
    - libxsmm:
        - arch: hsw
    - pspamm:
        - arch: hsw
    - mkl:
    - openblas:
    - blis:
    - eigen:

Here, we have:

name denotes a text for pretty-printing (required)
identifier contains a list of unique strings which are used to select the arch (required)
vendor, as far as known, contains a vendor name (optional)
isa contains the basic ISA, and instruction set extensions as bullet points. (all optional)
vectorsize contains the size of the vector register (required? optional?)
cachelinesize contains the size of a cache line (maybe optional, maybe make it default to vectorsize)
flags contains compile flags, as currently used in SeisSol. (optional)
- If any compiler is explicitly given, the specified flags are used. TODO: are there other flags, e.g. LDFLAGS we want to use?
- We may implicitly or explicitly need to add an order to deal with unknown compilers: suggestion would be to default to clang if provided, and to gcc if clang is not given.
gemmgen: the list of GEMM generators that Yateto uses, in preferred order (optional)
- if no GEMM generator is given, Yateto will only generate some for loops
- if we just give a GEMM to Yateto, we think it will go through the list from top to bottom and select the first working GEMM generator for the problem. As it is specified right now in the DefaultGeneratorCollection already.
- for each generator, we may also provide some arguments: for libxsmm_jit, libxsmm and pspamm how the architecture is called internally. (maybe that's unnecessary for libxsmm_jit?) That is useful e.g. for using Zen 1,2,3 with libxsmm: here we would just set arch: hsw.
base: a list of archs to inherit the options from. Archs that are mentioned later in the list override those that are in it earlier. (optional, by default assumed to be an empty list) Not shown in this example.

Suggestions/comments? Did I forget something? Or does a database like that exist already and we only need one for the gemmgen options?

If you like this approach, we may also do sort of the same for GPUs in the end. That might lessen the work in the hw_descr.py as provided in gemmforge/chainforge.

krenzland commented 1 year ago

The idea itself is good. Parsing in cmake might be super painful, but could be converted with a small python script. BTW, if we'd use JSON, we could parse it directly in Cmake (3.19): https://cmake.org/cmake/help/latest/command/string.html#json

cachelinesize contains the size of a cache line (maybe optional, maybe make it default to vectorsize)

Should be separate. E.g. on A64FX cachline size is 256 bytes vs 64 byte vector instruction length

redzone modifier (assuming it is required with newer libxsmm versions?)

davschneller commented 1 year ago

The idea itself is good. Parsing in cmake might be super painful, but could be converted with a small python script. BTW, if we'd use JSON, we could parse it directly in Cmake (3.19): https://cmake.org/cmake/help/latest/command/string.html#json

I may guess that a Python script may be the most convenient option—we already depend on Python during build due to Yateto itself. Otherwise, we could of course use JSON, or convert from YAML to JSON (if YAML is easier to read for this which I'd assume).

cachelinesize contains the size of a cache line (maybe optional, maybe make it default to vectorsize)

Should be separate. E.g. on A64FX cachline size is 256 bytes vs 64 byte vector instruction length

Ok, then it shall be so.

* redzone modifier (assuming it is required with newer libxsmm versions?)

Is it? If yes, where? (couldn't find it on main when grepping redzone, ZONE, or zone at least)

uphoffc commented 1 year ago

https://github.com/SeisSol/SeisSol/blob/e6ef66194c9b4aecc62bd8d5e808427f1376a8d3/CMakeLists.txt#L473

-mno-red-zone should not be required for libxsmm_jit but only for the libxsmm generator. The flag is necessary as the offline generator wraps inline assembly inside a function, but the compiler does not inspect the inline assembly. Now if the inline assembly modifies the stack (pushq / popq) then it overwrites the "red zone" that the compiler thinks it's safe to use.

krenzland commented 1 year ago

IMO Json would be easier, you don't modify these settings that often anyways. Assuming this makes the CMake stuff easier.

-mno-red-zone should not be required for libxsmm_jit but only for the libxsmm generator. The flag is necessary as the offline generator wraps inline assembly inside a function, but the compiler does not inspect the inline assembly. Now if the inline assembly modifies the stack (pushq / popq) then it overwrites the "red zone" that the compiler thinks it's safe to use.

Time to deprecate the old libxsmm generator? ;)

uphoffc commented 1 year ago

Time to deprecate the old libxsmm generator? ;)

Afaik the offline generator is deprecated for years...

krenzland commented 1 year ago

Yes, it is. I meant deprecating it in SeisSol as well and making the jit one standard

davschneller commented 1 year ago

Ok, found a CPU database which contains compiler flags and ISA info at least: https://github.com/archspec/archspec-json/blob/master/cpu/microarchitectures.json

But, as far as I could see, no info on the cacheline etc. ... We could include something like the CPUID database here still. That one should have info there.

With that, we would at least have the vector length to specify—i.e. maybe we'll have one template per vector instruction set in the end? As the gemmgen can usually be bound to a vector instruction set at least. And maybe we may add processors, if any of them have specialized gemmgen options.

(long-term, we may have also to consider different floating point/number data types, and so on)

To summarize, here are some more ideas for the format, now splitting between vector arch and CPU:

- identifier: sve512
  features: [sve]
  vectorsize: 64
  type: isa
  gemmgen-priority: [libxsmm_jit, pspamm]
  gemmgen:
    pspamm:
      arch: arm_sve512
    libxsmm_jit:
- identifier: a64fx # which features are implemented is taken from the database above... Or we specify explicitly
  type: cpu
  cachelinesize: 256

gemmgen and its priority have been split, to allow easier subtyping for CPUs. (e.g. if libxsmm has special options for a certain processor, then we can re-specify a gemmgen option there)

krenzland commented 1 year ago

The idea is nice, but aren't we overthinking this a bit at this point? Compiler flags might be a good idea to include in this format. I like the archspec thing (at a first glance) but I haven't closely checked whether it's a good fit for us.

SeisSol / yateto

Suggestion: Adding a CPU Database #60