dask / dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE
https://jobqueue.dask.org
BSD 3-Clause "New" or "Revised" License
235 stars 142 forks source link

Documentation bug: interface #603

Open davide-q opened 1 year ago

davide-q commented 1 year ago

The documentation at https://jobqueue.dask.org/en/latest/generated/dask_jobqueue.SLURMCluster.html (and the one for the other schedulers) says

interfacestr Network interface like ‘eth0’ or ‘ib0’. This will be used both for the Dask scheduler and the Dask workers interface. If you need a different interface for the Dask scheduler you can pass it through the scheduler_options argument: interface=your_worker_interface, scheduler_options={'interface': your_scheduler_interface}.

It's unclear what happens if one doesn't specify it. Looking at the code it appears that a default is used, which is taken from the config. The default config.yaml file has null value for interface so even looking in there one goes around in circle.

I propose:

  1. adding the sentence "If no interfacestr is specified the default from the jobqueue.yaml file is utilized" to the documentation
  2. adding yet another sentence saying what happens if no specific jobqueue.yaml file is present (so the one from the install directory must be used).

What happens in the second scenario is still unclear to me. On the machine I use it clearly works, so some interface is utilized, but which one? I have both eth0 and ib0. Is it guaranteed to work always if at least one interface is present? How is it chosen if more than one is present?

guillaumeeb commented 1 year ago

Hi @davide-q, thanks for raising this issue.

If one doesn't specify a network interface, the default one is used. I think in the end this comes from https://github.com/dask/distributed/blob/0063de53fed5e4e2e409940213c6265867e6635d/distributed/utils.py#L157. Usually it will be the default first ethernet interface.

There are two ways to specify arguments like that, either through code, either through yaml configuration file. This is true for all kwargs, see https://jobqueue.dask.org/en/latest/configuration-setup.html#configure-dask-jobqueue and https://jobqueue.dask.org/en/latest/configuration.html. So I think we don't want to add on the docstring the same sentence for every kwarg.

But I'm totally open to add a sentence explaining the default behavior (if no interface argument is given through code or through configuration file).

Is it guaranteed to work always if at least one interface is present?

It is guaranteed to use an interface, but the default interface on the Scheduler side (login node for example) might not be the same as on Worker side (compute nodes), or the nodes might even not have the same interfaces. Or more often, you won't use the most performant interface, defaulting to eth0 instead of ib0 (Infiniband based).