SeisSol / yateto

BSD 3-Clause "New" or "Revised" License
6 stars 5 forks source link

Suggestion: Adding a CPU Database #60

Open davschneller opened 1 year ago

davschneller commented 1 year ago

Hi everyone, this is more or less meant as a suggestion issue... So: we're getting to a point where we have many CPU architectures, so Yateto is getting cluttered with a lot of case distinctions. Therefore, the following suggestion: we move everything concerning a CPU architecture in Yateto and SeisSol into an extra file—at the caveat of having to parse it during configuration (or maybe generate a CMake file out of it before that). The following suggestion, as an example, we have Haswell here. The respective file would contain more archs in the end.

- name: Haswell
  identifier: [haswell, hsw]
  vendor: intel
  isa: x86-64
    - avx2
    - fma
  vectorsize: 32
  cachelinesize: 64
  flags:
    - gcc:
        - flags: -mavx2 -mfma
    - icc:
        - flags: -xCORE-AVX2 -fma
  gemmgen:
    - libxsmm_jit:
        - arch: hsw
    - libxsmm:
        - arch: hsw
    - pspamm:
        - arch: hsw
    - mkl:
    - openblas:
    - blis:
    - eigen:

Here, we have:

Suggestions/comments? Did I forget something? Or does a database like that exist already and we only need one for the gemmgen options?

If you like this approach, we may also do sort of the same for GPUs in the end. That might lessen the work in the hw_descr.py as provided in gemmforge/chainforge.

krenzland commented 1 year ago

The idea itself is good. Parsing in cmake might be super painful, but could be converted with a small python script. BTW, if we'd use JSON, we could parse it directly in Cmake (3.19): https://cmake.org/cmake/help/latest/command/string.html#json

cachelinesize contains the size of a cache line (maybe optional, maybe make it default to vectorsize)

Should be separate. E.g. on A64FX cachline size is 256 bytes vs 64 byte vector instruction length

davschneller commented 1 year ago

The idea itself is good. Parsing in cmake might be super painful, but could be converted with a small python script. BTW, if we'd use JSON, we could parse it directly in Cmake (3.19): https://cmake.org/cmake/help/latest/command/string.html#json

I may guess that a Python script may be the most convenient option—we already depend on Python during build due to Yateto itself. Otherwise, we could of course use JSON, or convert from YAML to JSON (if YAML is easier to read for this which I'd assume).

cachelinesize contains the size of a cache line (maybe optional, maybe make it default to vectorsize)

Should be separate. E.g. on A64FX cachline size is 256 bytes vs 64 byte vector instruction length

Ok, then it shall be so.

* redzone modifier (assuming it is required with newer libxsmm versions?)

Is it? If yes, where? (couldn't find it on main when grepping redzone, ZONE, or zone at least)

uphoffc commented 1 year ago

https://github.com/SeisSol/SeisSol/blob/e6ef66194c9b4aecc62bd8d5e808427f1376a8d3/CMakeLists.txt#L473

-mno-red-zone should not be required for libxsmm_jit but only for the libxsmm generator. The flag is necessary as the offline generator wraps inline assembly inside a function, but the compiler does not inspect the inline assembly. Now if the inline assembly modifies the stack (pushq / popq) then it overwrites the "red zone" that the compiler thinks it's safe to use.

krenzland commented 1 year ago

IMO Json would be easier, you don't modify these settings that often anyways. Assuming this makes the CMake stuff easier.

-mno-red-zone should not be required for libxsmm_jit but only for the libxsmm generator. The flag is necessary as the offline generator wraps inline assembly inside a function, but the compiler does not inspect the inline assembly. Now if the inline assembly modifies the stack (pushq / popq) then it overwrites the "red zone" that the compiler thinks it's safe to use.

Time to deprecate the old libxsmm generator? ;)

uphoffc commented 1 year ago

Time to deprecate the old libxsmm generator? ;)

Afaik the offline generator is deprecated for years...

krenzland commented 1 year ago

Yes, it is. I meant deprecating it in SeisSol as well and making the jit one standard

davschneller commented 1 year ago

Ok, found a CPU database which contains compiler flags and ISA info at least: https://github.com/archspec/archspec-json/blob/master/cpu/microarchitectures.json

But, as far as I could see, no info on the cacheline etc. ... We could include something like the CPUID database here still. That one should have info there.

With that, we would at least have the vector length to specify—i.e. maybe we'll have one template per vector instruction set in the end? As the gemmgen can usually be bound to a vector instruction set at least. And maybe we may add processors, if any of them have specialized gemmgen options.

(long-term, we may have also to consider different floating point/number data types, and so on)

To summarize, here are some more ideas for the format, now splitting between vector arch and CPU:

- identifier: sve512
  features: [sve]
  vectorsize: 64
  type: isa
  gemmgen-priority: [libxsmm_jit, pspamm]
  gemmgen:
    pspamm:
      arch: arm_sve512
    libxsmm_jit:
- identifier: a64fx # which features are implemented is taken from the database above... Or we specify explicitly
  type: cpu
  cachelinesize: 256

gemmgen and its priority have been split, to allow easier subtyping for CPUs. (e.g. if libxsmm has special options for a certain processor, then we can re-specify a gemmgen option there)

krenzland commented 1 year ago

The idea is nice, but aren't we overthinking this a bit at this point? Compiler flags might be a good idea to include in this format. I like the archspec thing (at a first glance) but I haven't closely checked whether it's a good fit for us.