geerlingguy / top500-benchmark

Automated Top500 benchmark for clusters or single nodes.
MIT License
159 stars 17 forks source link

Benchmark M1 Max Mac Studio #4

Closed geerlingguy closed 1 year ago

geerlingguy commented 1 year ago

To do this:

  1. git clone https://github.com/geerlingguy/top500-benchmark.git && cd top500-benchmark
  2. cp example.hosts.ini hosts.ini && cp example.config.yml config.yml
  3. Edit config.yml and change Qs to 10 (for 10 vCPUs in Docker)
  4. Make sure Docker is running, and increase RAM to 32 GB and CPUs to 10 in Resources settings.
  5. Start container: docker run --name top500 -it -v $PWD:/code geerlingguy/docker-ubuntu2204-ansible:latest bash
  6. Go into code directory: cd /code
  7. Run the benchmark: ansible-playbook main.yml --tags "setup,benchmark"
geerlingguy commented 1 year ago

262.45 Gflops in my first run, but need to do some power profiling and tweak Docker settings to see if I can get better. (I'm running inside Docker so there will inevitably be a little overhead).

geerlingguy commented 1 year ago
================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   50004
NB     :     256
PMAP   : Row-major process mapping
P      :       1
Q      :      10
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       50004   256     1    10             315.36             2.6432e+02
HPL_pdgesv() start time Fri Nov 18 04:40:23 2022

HPL_pdgesv() end time   Fri Nov 18 04:45:39 2022

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   8.60823560e-04 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

So 264.32 Gflops at 66W, or 4 Gflops/W