Closed laurasootes closed 6 months ago
The problem appears to be caused by Snellius maintenance and should be resolved once all nodes have been rebooted.
Just for completeness, here the response of surf:
*Dear Dr. Porth, After the latest maintenance, we experienced a performance degradation, which could be attributed to the energy efficiency monitoring system. The daemon has been disabled, and most nodes have been successfully rebooted. However, a few nodes are still in the 'drain' state and are awaiting reboot. Rebooted nodes should perform as before. Please check the the status of the nodes with sinfo. If you provide an estimate of SBUs losses (due to timeout or lower performance), we will compensate for that. We are sorry for the inconvenience this problem has caused. Kind regards, Stefan Wolfsheimer
@oporth There seems to be a difference in obtained baseline performance. On Snellius, for a run on 1 node using the nvidia jobfile
The first run (22 mei):
| __ __ ____ ____ | | | / | | / | / | / / / | | | | |/| | |) | |____ / | |/| | |) / / _ | | | | | | | | /| |____/ | | | | _ < V / | | | || ||_| || // | ||| // ____| |
Reading amrvac.par
Output type | tsavestart | dtsave | ditsave | itsave(1) | tsave(1) log | 0.000E+00 | | 10 | 0 | normal | 0.000E+00 | * | ** | ** | * slice | 0.000E+00 | * | ** | ** | * collapsed | 0.000E+00 | * | ** | ** | * analysis | 0.000E+00 | * | ** | ** | *
Warning: coordinate system is not specified! call set_coordinate_system in usr_init in mod_usr.t Now use Cartesian coordinate Domain size (cells): 4096 4096 Level one dx: 0.244E-03 0.244E-03 Refine estimation: Lohner's scheme restart_from_file: undefined converting: F
3D HD KH --assuming y ranging from 0-1! --density ratio: 10.00000000000000
--kx: 12.56637061435917
--vextra: 0.000000000000000
Startup phase took : 0.425 sec
Start integrating, print status every 3.00E+01 seconds
it time dt wc-time(s)
0 0.0000E+00 1.0105E-05 1.0483E-01
51 2.5608E-03 5.3181E-05 3.0196E+01
Total timeloop took : 58.288 sec Time spent on AMR : 0.000 sec Percentage: 0.00 % Time spent on IO in loop : 0.428 sec Percentage: 0.73 % Time spent on ghost cells : 15.626 sec Percentage: 26.81 % Time spent on computing : 42.234 sec Percentage: 72.46 % Cells updated / proc / sec : 4.497E+05
Saving visual data. Coordinate directions and variable names are: 1 X
2 Y
3 rho
4 v1
5 v2
6 p
time = 5.1665764415896450E-003
Total time spent on IO : 31.524 sec Total timeintegration took : 89.384 sec
100 5.167E-03 5.318E-05 5.829E+01
Finished AMRVAC in : 89.809 sec
JOB STATISTICS
Job ID: 6331449 Cluster: snellius User/Group: lootes/lootes State: COMPLETED (exit code 0) Nodes: 1 Cores per node: 192 CPU Utilized: 05:16:22 CPU Efficiency: 71.64% of 07:21:36 core-walltime Job Wall-clock time: 00:02:18 Memory Utilized: 336.64 GB (estimated maximum) Memory Efficiency: 100.19% of 336.00 GB (336.00 GB/node)
The second run (28 mei):
| __ __ ____ ____ | | | / | | / | / | / / / | | | | |/| | |) | |____ / | |/| | |) / / _ | | | | | | | | /| |____/ | | | | _ < V / | | | || ||_| || // | ||| // ____| |
Reading amrvac.par
Output type | tsavestart | dtsave | ditsave | itsave(1) | tsave(1) log | 0.000E+00 | | 10 | 0 | normal | 0.000E+00 | * | ** | ** | * slice | 0.000E+00 | * | ** | ** | * collapsed | 0.000E+00 | * | ** | ** | * analysis | 0.000E+00 | * | ** | ** | *
Warning: coordinate system is not specified! call set_coordinate_system in usr_init in mod_usr.t Now use Cartesian coordinate Domain size (cells): 4096 4096 Level one dx: 0.244E-03 0.244E-03 Refine estimation: Lohner's scheme restart_from_file: undefined converting: F
3D HD KH --assuming y ranging from 0-1! --density ratio: 10.00000000000000
--kx: 12.56637061435917
--vextra: 0.000000000000000
Startup phase took : 0.354 sec
Start integrating, print status every 3.00E+01 seconds
it time dt wc-time(s)
0 0.0000E+00 1.0105E-05 4.9008E-03
Total timeloop took : 13.279 sec Time spent on AMR : 0.000 sec Percentage: 0.00 % Time spent on IO in loop : 0.260 sec Percentage: 1.96 % Time spent on ghost cells : 0.821 sec Percentage: 6.19 % Time spent on computing : 12.197 sec Percentage: 91.86 % Cells updated / proc / sec : 1.974E+06
Saving visual data. Coordinate directions and variable names are: 1 X
2 Y
3 rho
4 v1
5 v2
6 p
time = 5.1665764415896450E-003
Total time spent on IO : 20.906 sec Total timeintegration took : 33.925 sec
100 5.167E-03 5.318E-05 1.328E+01
Finished AMRVAC in : 34.279 sec
JOB STATISTICS
Job ID: 6411180 Cluster: snellius User/Group: lootes/lootes State: COMPLETED (exit code 0) Nodes: 1 Cores per node: 192 CPU Utilized: 02:17:00 CPU Efficiency: 54.89% of 04:09:36 core-walltime Job Wall-clock time: 00:01:18 Memory Utilized: 294.44 GB (estimated maximum) Memory Efficiency: 87.63% of 336.00 GB (336.00 GB/node)