Open mens-artis opened 1 year ago
It seems that the program just ends with an error, like, one end of the TCP connection continuing after the job is finished. This does cause a problem, notably that my GPUinfo cannot be run but it seems to be a side effect of the disconnection. Or, even a different problem. So I guess I will delete it here. I don't have time to study it further and taking a different approach.
Hi @mens-artis,
I'm not sure about the context here, you are talking about SMAC3
or GPUinfo
which I don't know about.
Anyway, yes, there are sometimes Error messages when shutting down a Dask cluster, but as you've noticed, this does not cause a problem for your computation.
I had inserted the following code at the top of submit_trial(), to avoid a timeout from the scheduler. This may be quite central because apparently SMAC3 expects the schedulere to launch the compute nodes instantly:
and