Closed jrueb closed 1 year ago
I have removed the weakref.finalize
call here, which was one way to remove jobs at program exit. This is because finalize
interferes with the fact that _close_job
is a coroutine now. At program exit, Distributed already takes care of job removal here
https://github.com/dask/distributed/blob/8e0c79bb82b5d9e3f4f9538935520da960588df6/distributed/deploy/spec.py#L692C1-L692C1
In fact, before this PR, jobs were removed twice, which made the program exit unnecessary lengthy.
There is one issue left however: As you can see in the linked Distributed code, there is a timeout of 10 seconds. This is not enough to remove much more than 350 jobs (at least in my tests with HTCondor). One may want to ask to increase this timeout.
@jacobtomlinson could you take a quick look at this? I'm not familiar enough with asyncio to give a good advice.
Hi @jrueb, sorry again for the delay... It looks there is formatting problem in the source code, which sounds a little weird considering you didn't touch some of these files...
Could you check flake8 and black output?
I adjusted the 8 files black was complaining about (removal of blank lines). The PR did not touch the files before though, meaning these blank lines where there before.
It is unclear what the CI's issue is. The black of pre-commit run --all-files
is passing for me and besides core.py
this PR does not touch any of these files.
The PR did not touch the files before though, meaning these blank lines where there before.
this PR does not touch any of these files.
Yeah, I know, you are totally right. I won't block here because CI fixing is unrelated to your changes. I shouldn't have asked you to fix it in the first place. Sorry it messes up a bit your changes.
Would you prefer to revert your last commit? Or are you OK that I merge as is?
Would you prefer to revert your last commit? Or are you OK that I merge as is?
Either is fine for me. If it's the easiest to merge as is, I'm okay with that.
Is anything else needed for this PR?
Sorry for the lack of follow-up, I'll merge this one as is, as we opened an issue to fix the CI.
This solution is using asyncio.create_subprocess_exec and similar to what was done in PR #473. However, this other PR lacked an update job removal, which is included here.
The
test_jobqueue_job_call
required an overhaul to make it compatible with the async code. Its form is basically the same as the other async tests now.In issue #470 another solution using
run_in_executor
was proposed. I consider the solution withcreate_subprocess_exec
more rigorous though, because withcreate_subprocess_exec
the code is fully using the async API, instead of wrapping non-async functionality in arun_in_executor
call.