Open bnaul opened 2 years ago
cc @zsailer @Carreau in case they have thoughts
For reference, when running locally we often create a Scheduler in a thread. Would it be possible to not register signal handlers if we're not running from the main thread?
@bnaul I'm able to reproduce this error. It's not that it's calling Scheduler from an async block, it's that it's calling it in a separate thread. The interactive Cluster
and Client
objects create a new thread where they run Dask things. This keeps the main thread open for user interactions.
This case didn't come up in testing because it's a little strange to use the interactive Cluster
objects in a situation where you would want a Jupyter notebook. In that case you already clearly have access to the machine where things are running.
Assuming that you're running KubeCluster
I suspect that there is an option that runs the Scheduler in a remote pod. In that case I suspect that you wouldn't run into an issue.
Thanks @mrocklin, that makes sense. I am doing something unusual (maybe ill-advised?) here: I'm using Helm to manage this scheduler but still want adaptivity, so I'm calling KubeCluster(deploy_mode="local")
in an entrypoint script which I guess as you point out is causing the scheduler to be created in a separate thread.
It does sound like this is still a real issue but will defer to you on whether it's worth investigating or just shouldn't be supported.
I think you should be able to do :
if threading.current_thread() is not threading.main_thread():
ServerApp.init_signal = lambda self:None
ServerApp._restore_sigint_handler = lambda self:None
And none of the signal will we setup when you call ServerApp.initialize.
What happened: Testing out the new
jupyter
flag on distributed 2022.8, I ran into the following error:Minimal Complete Verifiable Example:
Output:
Anything else we need to know?: I am guessing this is because
SpecCluster
is callingScheduler()
from an async block?Environment: