dask / dask-mpi

Deploy Dask using MPI4Py
BSD 3-Clause "New" or "Revised" License
52 stars 29 forks source link

Dashboard returns 404 error in Dask-MPI #126

Closed alessandrocornacchia closed 4 months ago

alessandrocornacchia commented 5 months ago

I am trying to access the bokeh dashboard in a HPC environment managed by a Slurm scheduler.

I installed dask_mpi using conda. I ran the following in the Slurm submission script:

mpirun -np $SLURM_NTASKS dask-mpi --scheduler-file scheduler.json

The scheduler starts correctly, and I can also connect with a Client.

INFO: localdir at /scratch/974298.acornacchia
INFO: your job will run on local system.
2024-06-27 00:23:00,531 - distributed.scheduler - INFO - State start
2024-06-27 00:23:00,585 - distributed.scheduler - INFO -   Scheduler at:   tcp://192.168.7.50:8786
2024-06-27 00:23:00,585 - distributed.scheduler - INFO -   dashboard at:  http://192.168.7.50:8787/status
2024-06-27 00:23:00,641 - distributed.scheduler - INFO - Registering Worker plugin shuffle
2024-06-27 00:23:00,927 - distributed.nanny - INFO -         Start Nanny at: 'tcp://192.168.7.69:46705'
2024-06-27 00:23:00,928 - distributed.nanny - INFO -         Start Nanny at: 'tcp://192.168.7.73:41865'
2024-06-27 00:23:00,944 - distributed.nanny - INFO -         Start Nanny at: 'tcp://192.168.7.70:44559'
2024-06-27 00:23:45,142 - distributed.scheduler - INFO - Receive client connection: Client-bfa9d376-340a-11ef-944a-e4434b640dd8
2024-06-27 00:24:36,898 - distributed.core - INFO - Starting established connection to tcp://192.168.7.254:49934
2024-06-27 00:24:36,898 - distributed.core - INFO - Connection to tcp://192.168.7.254:49934 has been closed.
2024-06-27 00:24:36,898 - distributed.scheduler - INFO - Remove client Client-bfa9d376-340a-11ef-944a-e4434b640dd8

However, the dashboard returns 404 HTTP error when I try to access its url

wget http://192.168.7.50:8787/status

--2024-06-27 00:26:31--  http://192.168.7.50:8787/status
Connecting to 192.168.7.50:8787... connected.
HTTP request sent, awaiting response... 404 Not Found
2024-06-27 00:26:31 ERROR 404: Not Found.

Environment:

Additional notes: This does not happen using dask-jobqueue, the dashboard runs correctly.

mrocklin commented 5 months ago

cc @kmpaul @jacobtomlinson

jacobtomlinson commented 5 months ago

I can confirm I am able to reproduce this on my machine. I'm going to transfer this issue over to dask-mpi as it seems to be related to how that library is starting up the scheduler.

jacobtomlinson commented 5 months ago

Ah it looks like you need to explicity specify the dashboard address.

The following works for me:

mpirun -np $SLURM_NTASKS dask-mpi --scheduler-file scheduler.json --dashboard-address :8787

This is a little unintuitive. I'll open a PR to enable it by default and add a flag to disable this, this is the same way dask scheduler works on the CLI.