geerlingguy / top500-benchmark

Automated Top500 benchmark for clusters or single nodes.
MIT License
159 stars 17 forks source link

Single-node playbook execution fails during firewall configuration due to undefined 'host_ips' variable #21

Closed haubenr closed 11 months ago

haubenr commented 11 months ago

When running the playbook on a single-node installation using the suggested ansible-playbook main.yml --tags "setup,benchmark" call the playbook execution fails during firewall configuration due to the variable host_ips being undefined:

...
TASK [include_tasks] **********************************************************************************************************************************************************************************************
included: /tmp/top500-benchmark/firewall/configure-firewall.yml for 127.0.0.1

TASK [Creating new custom firewall zone.] *************************************************************************************************************************************************************************
changed: [127.0.0.1]

TASK [Setting custom firewall zone to accept connections.] ********************************************************************************************************************************************************
changed: [127.0.0.1]

TASK [Adding nodes as trusted sources in the firewall.] ***********************************************************************************************************************************************************
fatal: [127.0.0.1]: FAILED! =>
  msg: '''host_ips'' is undefined. ''host_ips'' is undefined'

The host_ips fact is only set as part of the node SSH configuration playbook (tagged 'ssh') hence it's not present if the tasks with this tag are not executed at all.

L0afin commented 11 months ago

can confirm i'm getting the same issue while trying to benchmark on asahi fedora 39

PLAY [Install linpack benchmark.] *********************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ******************************************************************************************************************************************************
included: /home/loafin/Desktop/top500-benchmark/dependencies/rhel-based.yml for 127.0.0.1

TASK [Update dnf cache.] ******************************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Install dependencies.] **************************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ******************************************************************************************************************************************************
skipping: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ******************************************************************************************************************************************************
skipping: [127.0.0.1]

TASK [Create required temporary directories.] *********************************************************************************************************************************************
ok: [127.0.0.1] => (item=/opt/top500/tmp)
ok: [127.0.0.1] => (item=/opt/top500/tmp/blis-build)

TASK [Download MPI (Message Passing Interface).] ******************************************************************************************************************************************
skipping: [127.0.0.1]

TASK [Build MPI (takes a while).] *********************************************************************************************************************************************************
ok: [127.0.0.1] => (item=./configure --with-device=ch3:sock FFLAGS=-fallow-argument-mismatch)
ok: [127.0.0.1] => (item=make -j12)

TASK [Install MPI.] ***********************************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Create 'COMPILE_MPI_COMPLETE' file.] ************************************************************************************************************************************************
changed: [127.0.0.1]

TASK [Test if we can set CPU scaling parameters.] *****************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Ensure CPU scaling is set to 'performance'.] ****************************************************************************************************************************************
changed: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ******************************************************************************************************************************************************
skipping: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ******************************************************************************************************************************************************
included: /home/loafin/Desktop/top500-benchmark/tasks/algebra_blis.yml for 127.0.0.1

TASK [Download Blis linear algebra library.] **********************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Install Blis.] **********************************************************************************************************************************************************************
ok: [127.0.0.1] => (item=./configure --prefix=/opt/blis auto)
ok: [127.0.0.1] => (item=make -j12)
ok: [127.0.0.1] => (item=make install)

TASK [Create 'COMPILE_BLIS_COMPLETE' file.] ***********************************************************************************************************************************************
changed: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ******************************************************************************************************************************************************
skipping: [127.0.0.1]

TASK [Download HPL (High Performance Linpack).] *******************************************************************************************************************************************
skipping: [127.0.0.1]

TASK [Set up HPL makefile.] ***************************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Copy HPL makefile into place.] ******************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Install HPL.] ***********************************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Create COMPILE_HPL_COMPLETE file.] **************************************************************************************************************************************************
changed: [127.0.0.1]

PLAY [Configure SSH connections between nodes.] *******************************************************************************************************************************************

PLAY [Run linpack benchmark.] *************************************************************************************************************************************************************

TASK [Gathering Facts] ********************************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Create a file describing nodes for MPI execution.] **********************************************************************************************************************************
ok: [127.0.0.1]

TASK [Create HPL.dat file.] ***************************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [include_tasks] **********************************************************************************************************************************************************************
included: /home/loafin/Desktop/top500-benchmark/firewall/configure-firewall.yml for 127.0.0.1

TASK [Creating new custom firewall zone.] *************************************************************************************************************************************************
ok: [127.0.0.1]

TASK [Setting custom firewall zone to accept connections.] ********************************************************************************************************************************
ok: [127.0.0.1]

TASK [Adding nodes as trusted sources in the firewall.] ***********************************************************************************************************************************
fatal: [127.0.0.1]: FAILED! => 
  msg: '''host_ips'' is undefined. ''host_ips'' is undefined'

PLAY RECAP ********************************************************************************************************************************************************************************
127.0.0.1                  : ok=24   changed=4    unreachable=0    failed=1    skipped=6    rescued=0    ignored=0   
geerlingguy commented 11 months ago

Oopsie! Part of https://github.com/geerlingguy/top500-benchmark/pull/12

I will push up a fix, adds a duplicate task but that's the easy way around.

L0afin commented 11 months ago

not sure if I should open a new issue but this seems to be more of a continuation of this one. I noticed the update and tried running the single node playbook again and got this output. It seems like now it gets past the firewall but it doesn't seem to run any tests as this runs almost instantly while when i run with no tags it takes ~90 seconds to run and gives a result fine

loafin@loafbook-pro-linux-edition:~/Desktop/top500-benchmark$ ansible-playbook main.yml --tags "setup,benchmark"

PLAY [Install linpack benchmark.] *****

TASK [Gathering Facts] **** ok: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ** included: /home/loafin/Desktop/top500-benchmark/dependencies/rhel-based.yml for 127.0.0.1

TASK [Update dnf cache.] ** ok: [127.0.0.1]

TASK [Install dependencies.] ** ok: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ** skipping: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ** skipping: [127.0.0.1]

TASK [Create required temporary directories.] ***** ok: [127.0.0.1] => (item=/opt/top500/tmp) ok: [127.0.0.1] => (item=/opt/top500/tmp/blis-build)

TASK [Download MPI (Message Passing Interface).] ** skipping: [127.0.0.1]

TASK [Build MPI (takes a while).] ***** ok: [127.0.0.1] => (item=./configure --with-device=ch3:sock FFLAGS=-fallow-argument-mismatch) ok: [127.0.0.1] => (item=make -j12)

TASK [Install MPI.] *** ok: [127.0.0.1]

TASK [Create 'COMPILE_MPI_COMPLETE' file.] **** changed: [127.0.0.1]

TASK [Test if we can set CPU scaling parameters.] ***** ok: [127.0.0.1]

TASK [Ensure CPU scaling is set to 'performance'.] **** changed: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ** skipping: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ** included: /home/loafin/Desktop/top500-benchmark/tasks/algebra_blis.yml for 127.0.0.1

TASK [Download Blis linear algebra library.] ** ok: [127.0.0.1]

TASK [Install Blis.] ** ok: [127.0.0.1] => (item=./configure --prefix=/opt/blis auto) ok: [127.0.0.1] => (item=make -j12) ok: [127.0.0.1] => (item=make install)

TASK [Create 'COMPILE_BLIS_COMPLETE' file.] *** changed: [127.0.0.1]

TASK [ansible.builtin.include_tasks] ** skipping: [127.0.0.1]

TASK [Download HPL (High Performance Linpack).] *** skipping: [127.0.0.1]

TASK [Set up HPL makefile.] *** ok: [127.0.0.1]

TASK [Copy HPL makefile into place.] ** ok: [127.0.0.1]

TASK [Install HPL.] *** ok: [127.0.0.1]

TASK [Create COMPILE_HPL_COMPLETE file.] ** changed: [127.0.0.1]

PLAY [Configure SSH connections between nodes.] ***

PLAY [Run linpack benchmark.] *****

TASK [Gathering Facts] **** ok: [127.0.0.1]

TASK [Create a file describing nodes for MPI execution.] ** ok: [127.0.0.1]

TASK [Create HPL.dat file.] *** ok: [127.0.0.1]

TASK [Generate list of host IP addresses.] **** fatal: [127.0.0.1]: FAILED! => msg: |- The task includes an option with an undefined variable. The error was: 'host_ips' is undefined. 'host_ips' is undefined

The error appears to be in '/home/loafin/Desktop/top500-benchmark/main.yml': line 194, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:

    - name: Generate list of host IP addresses.
      ^ here

PLAY RECAP **** 127.0.0.1 : ok=21 changed=4 unreachable=0 failed=1 skipped=6 rescued=0 ignored=0

geerlingguy commented 11 months ago

@L0afin - d'oh! Just pushed another commit, should fix this time maybe for real :)

L0afin commented 11 months ago

looks like it works! thanks for being so quick with the fixes watching your videos where you run this on different devices got me interested in finally running it on my own devices for fun! In case you're curious what i got as the result I threw it in. I have the 14" m2 pro with 12 cores and 19 gpu cores on the latest release of asahi fedora 39 remix

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       34581   256     1    12              92.85             2.9693e+02
HPL_pdgesv() start time Fri Nov  3 09:15:11 2023

HPL_pdgesv() end time   Fri Nov  3 09:16:43 2023

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.10304552e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================
geerlingguy commented 11 months ago

@L0afin - Nice! That seems in line with expectations for M2 Pro (faster than my M1 Max!). I'm really torn about getting an M3 Pro or M3 Max MacBook Pro to upgrade from my piddly M2 Air...