Open FlorianDeconinck opened 5 months ago
Has part of this work we should also do projection of requirements for running bigger simulations, now and every year upward.
Per Tsengdar"
Per Laura:
Working on it as part of the SC24 presentation.
Resolution: required resolution to be run.
Model skill: physical processes to be resolved
Throughput: wall-time for the target simulation
Features: required skills of the technology to express the science
Maintainability: tools to ensure enduring good science code
Technological Debt: managing the inevitable growth in code
Time to solution: required wall time on a given hardware
Energy use: per hardware energy use (in KW)
Hardware optimization: per hardware memory bandwidth usage (in % of theoretical maximum)
Previous benchmark have been done with the "Node-to-node" metric to answer the question "can we replace a CPU node with a GPU node".
As we gear toward operation, this metric is no longer enough, should also be backed with more scientifically relevant metrics (Gridpoint, SYPD, SDPD which seems to be the GMAO preferred metric etc.).
We should also start measuring ourselves against the SCU17/18 Milan nodes and their 128 cores.
Electric consumptions and price are also previous metric we should carry.
Another angle is scaling and operational usefulness of each hardware, so that the narrative to the scientists is clear.
This process should involve the GMAO but remain lead by us as to make sure we can deliver.
Overall, pragmatism is key: we are not here to give roofline projection and peak FLOPS, we are here to deliver day-to-day usage.