esi-neuroscience / acme

Asynchronous Computing Made ESI
https://esi-acme.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
11 stars 2 forks source link

Slurm jobs still running on KeyboardInterrupt #23

Closed KatharineShapcott closed 3 years ago

KatharineShapcott commented 3 years ago

Is this usual dask behaviour? I can see my jobs are still running but when I look at my client it says they are stopped. My code looks like this:

from acme import ParallelMap, esi_cluster_setup
client = esi_cluster_setup(partition="8GBS",n_jobs=n_jobs) #, n_jobs=n_jobs
with ParallelMap(comparison_classifier, clf, data_train, label_train, data_test, train_size=train_sizes, 
                    n_inputs=n_trys, write_worker_results=write) as pmap:
    results = pmap.compute()

It seems like the workers are gone when I look at client and pmap.client: image I get the same if I look at the output of dd.get_client().

But according to squeue all my jobs are still running.

pantaray commented 3 years ago

Hi Katharine! Thanks for reporting this! Just to clarify: you hit CTRL + C while pmap.compute is running? I had some similar issues with dask before, when the scheduler broke down and pulled the client with it resulting in a client with zero workers and perfectly alive but detached SLURM jobs.

KatharineShapcott commented 3 years ago

Hey Stefan, Actually I hit stop in a jupyter kernel. That's why I still had access to client to see that it "thought" they were ended. In the past I assumed that it was expected behaviour that CTRL + C doesn't actually end the jobs because it always happened that way for me. I would usually kill them all with scancel manually.