dask / dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE
https://jobqueue.dask.org
BSD 3-Clause "New" or "Revised" License
235 stars 142 forks source link

Asynchronous job submission and removal #610

Closed jrueb closed 1 year ago

jrueb commented 1 year ago

This solution is using asyncio.create_subprocess_exec and similar to what was done in PR #473. However, this other PR lacked an update job removal, which is included here.

The test_jobqueue_job_call required an overhaul to make it compatible with the async code. Its form is basically the same as the other async tests now.

In issue #470 another solution using run_in_executor was proposed. I consider the solution with create_subprocess_exec more rigorous though, because with create_subprocess_exec the code is fully using the async API, instead of wrapping non-async functionality in a run_in_executor call.

jrueb commented 1 year ago

I have removed the weakref.finalize call here, which was one way to remove jobs at program exit. This is because finalize interferes with the fact that _close_job is a coroutine now. At program exit, Distributed already takes care of job removal here https://github.com/dask/distributed/blob/8e0c79bb82b5d9e3f4f9538935520da960588df6/distributed/deploy/spec.py#L692C1-L692C1 In fact, before this PR, jobs were removed twice, which made the program exit unnecessary lengthy.

There is one issue left however: As you can see in the linked Distributed code, there is a timeout of 10 seconds. This is not enough to remove much more than 350 jobs (at least in my tests with HTCondor). One may want to ask to increase this timeout.

guillaumeeb commented 1 year ago

@jacobtomlinson could you take a quick look at this? I'm not familiar enough with asyncio to give a good advice.

guillaumeeb commented 1 year ago

Hi @jrueb, sorry again for the delay... It looks there is formatting problem in the source code, which sounds a little weird considering you didn't touch some of these files...

Could you check flake8 and black output?

jrueb commented 1 year ago

I adjusted the 8 files black was complaining about (removal of blank lines). The PR did not touch the files before though, meaning these blank lines where there before.

jrueb commented 1 year ago

It is unclear what the CI's issue is. The black of pre-commit run --all-files is passing for me and besides core.py this PR does not touch any of these files.

guillaumeeb commented 1 year ago

The PR did not touch the files before though, meaning these blank lines where there before.

this PR does not touch any of these files.

Yeah, I know, you are totally right. I won't block here because CI fixing is unrelated to your changes. I shouldn't have asked you to fix it in the first place. Sorry it messes up a bit your changes.

Would you prefer to revert your last commit? Or are you OK that I merge as is?

jrueb commented 1 year ago

Would you prefer to revert your last commit? Or are you OK that I merge as is?

Either is fine for me. If it's the easiest to merge as is, I'm okay with that.

jrueb commented 1 year ago

Is anything else needed for this PR?

guillaumeeb commented 1 year ago

Sorry for the lack of follow-up, I'll merge this one as is, as we opened an issue to fix the CI.