Sometimes I start a cluster but then the time runs out for the slurm cue and the workers are killed. If I then try and start a new cluster e.g.:
acme.esi_cluster_setup(partition="8GBXS", n_workers=10, n_workers_startup=2, timeout=10, interactive_wait=1)
I get a message like this:
Syncopy <ACME: esi_cluster_setup> Found existing parallel computing client <Client: 'tcp://10.100.32.17:40905' processes=0 threads=0, memory=0 B>. Not starting new cluster.
However then I try to use pmap with this client and it crashes:
RuntimeError: <ACMEdaemon> no active workers found in distributed computing cluster <Client: 'tcp://10.100.32.17:40905' processes=0 threads=0, memory=0 B> Consider running
import dask.distributed as dd; dd.get_client().restart()
If this fails to make workers come online, please use
import acme; acme.cluster_cleanup()
to shut down any defunct distributed computing clients
Could the 0 active workers be detected and the empty client automatically cleaned up and replaced with the new one or similar? It's not a big deal though, I can easily work around it.
Sometimes I start a cluster but then the time runs out for the slurm cue and the workers are killed. If I then try and start a new cluster e.g.:
acme.esi_cluster_setup(partition="8GBXS", n_workers=10, n_workers_startup=2, timeout=10, interactive_wait=1)
I get a message like this:However then I try to use pmap with this client and it crashes:
Could the 0 active workers be detected and the empty client automatically cleaned up and replaced with the new one or similar? It's not a big deal though, I can easily work around it.