Closed matt-long closed 5 years ago
I think the SLURM config is reasonable, however, I would like to support good defaults for Hobart.
Hobart has 48 cores per node: http://www.cgd.ucar.edu/systems/documentation-toc/02.11.03_-_HPC_Cluster.html
resource_specneeds to be specified in
PBSCluster`. I was able to get it working with the following.
cluster = dask_jobqueue.PBSCluster(cores=48,
processes=48,
walltime='08:00:00',
memory='96GB', queue='medium',
resource_spec='nodes=1:ppn=48',
job_extra={'-r n'})
We need to have ./config/jobqueue-cheyenne.yaml ./config/jobqueue-hobart.yaml
and accept a machine argument in copy_config
.
Is it safe to close this issue since it was fixed in https://github.com/NCAR/ncar-jobqueue/pull/12?
Just popping in to encourage you all to suggest edits to /glade/u/apps/config/dask/dask.yaml
via the cisl help desk.
This file currently looks like:
distributed:
scheduler:
bandwidth: 1000000000 # GB MB/s estimated worker-worker bandwidth
worker:
memory:
target: 0.90 # Avoid spilling to disk
spill: False # Avoid spilling to disk
pause: 0.80 # fraction at which we pause worker threads
terminate: 0.95 # fraction at which we terminate the worker
comm:
compression: null
jobqueue:
pbs:
name: dask-worker
# Dask worker options
cores: 1 # Total number of cores per job
memory: '3 GB' # Total amount of memory per job
processes: 1 # Number of Python processes per job
interface: ib0 # Network interface to use like eth0 or ib0
# PBS resource manager options
queue: share
walltime: '00:30:00'
resource-spec: select=1
slurm:
name: dask-worker
# Dask worker options
cores: 1 # Total number of cores per job
memory: '25 GB' # Total amount of memory per job
processes: 1 # Number of Python processes per job
interface: ib0 # Network interface to use like eth0 or ib0
# SLURM resource manager options
walltime: '00:30:00'
job-extra: {-C skylake}
but if you feel like there are more reasonable default values, we can suggest edits.
dashboard
has changed names.Is the slurm configuration actually a good one?