LLNL / benchpark

An open collaborative repository for reproducible specifications of HPC benchmarks and cross site benchmarking environments
https://software.llnl.gov/benchpark/
Apache License 2.0
25 stars 24 forks source link

System definition #92

Open pearce8 opened 9 months ago

pearce8 commented 9 months ago

Updated:

Below the dashed line, we have the original system config design. Things have changed since then, and we need to:

What should the user use as a starter for their system definition?


System configs currently contain different types of information, which serves different purposes:

  1. Hardware specification
    • where defined: system_definition.yaml
    • systems it applies to: a class of systems at different sites
    • longevity: duration of the system (or class of systems) lifetime
    • purpose of record: find a system with the same hardware as my system. May want to record with the experiment.
  2. Software stack: compiler and MPI locations
    • where defined: Optional?!? compilers.yaml
    • systems it applies to: just ours? can we autodetect?
    • longevity: ?
    • purpose of record: give the users a starting point to running on their system. What errors and guidance for mitigation should we give? Do we want these upstreamed back? Do we want these recorded in the experiment?
  3. Software stack: compiler and MPI versions
    • where defined: compilers.yaml
    • systems it applies to: different machines could be at different versions
    • longevity: new versions can appear any time
    • purpose: give the users a starting point, also need to record as part of experiment - and use to debug or compare performance. Probably want to let users parameterize - and set up versions to use as part of their suites.
  4. Scheduler, launcher:
    • where defined: variables.yaml
    • systems it applies to: many. Probably need a slurm and a flux schedule definition, auto generated for the user when they tell us which it is (can we autodetect?). Probably need to define a few launchers and pick one (mpirun, srun, ...)
    • longevity: static, except the queue info is baked in here unfortunately.
    • purpose: give the users a starting point. Probably don't want upstreamed, may not need to record.
  5. Software packages we don't want to keep building
    • where defined: Optional! packages.yaml
    • systems it applies to: probably just ours. May be able to find using spack external.
    • longevity: yeah may want to update versions over time.
    • purpose: shorten build time. We do not want these upstreamed, but we want to be able to record for our own experiments/CI etc.

We should probably define a graded approach for generating these:

scheibelp commented 9 months ago

We should probably define a graded approach for generating these: only introduce a new hardware specification if one like it indeed does not exist.

To be clear, are you saying we shouldn't define a new file format like foo.yaml if foo.yaml includes details already in other yaml files?

system_definitions.yaml contains duplicate entries, but that's "by design" since it's supposed to be a human readable aggregation of other details.

software stack: compiler and MPI locations where defined: Optional?!? compilers.yaml

Spack auto-detects compilers as needed. All systems except x86 probably want to explicitly define a sensible default though.

What errors and guidance for mitigation should we give? 

I do not have much experience anticipating and pre-handling issues with the wrong compilers being used. The "more-different" a user's system is, the more they should consider defining this themselves. This generally isn't an issue until Spack gets to the build phase of things (e.g. wrong compiler can't generate build artifacts). The exact error message can depend on the compilers, but also propagate to higher-level issues (e.g. building different version may change c++ standard, which generates its own set of errors depending on whether particular compiler versions support that standard).

Software stack: compiler and MPI versions where defined: compilers.yaml

MPI versions are not defined in compilers.yaml, they might be defined in packages.yaml

longevity: new versions can appear any time

when you say they can appear at any time, do you mean that the user could add a compiler definition could appear at any time? Spack won't search for compilers if any are already defined, and it doesn't search for external packages without prompting.