Closed michmx closed 3 months ago
I think the current indentation is correct, for eg. SUBMITTING is not allowed to go to KILLED:
SUBMITTING: State(0, [RECEIVED, CHECKING, DELETED], defState=SUBMITTING), # initial state
WAITING is allowed to go to KILLED since #7276, so it should be moved from this list of job statuses which are not allowed to go to KILLED to that which can go to KILLED.
Or, can't the possible states transitions (https://github.com/DIRACGrid/DIRAC/blob/rel-v8r0/src/DIRAC/WorkloadManagementSystem/Client/JobStatus.py) be checked instead of these hard-coded list?
I think @iueda is correct. For checking the state transitions, https://github.com/DIRACGrid/DIRAC/blob/5f268bd719d0b24459ab052464bcbb134bae4efa/src/DIRAC/WorkloadManagementSystem/Client/JobStatus.py#L132 (`ā€ˇfilterJobStateTransition) can be used.
@fstagni we have a fundamental question before moving on with this PR or a more elaborated implementation. When you have a job in Waiting status, is it meaningful to use sendKillCommand=True
in killJob()?
I ask because with the current indentation, only statuses not hard-coded are appended in the markKilledJobList
, which use sendKillCommand=False
(this was the behavior on DIRAC v7r2 for waiting jobs).
Moving the Waiting status to line 520 would add them to killJobList
, using sendKillCommand=True
.
sendKillCommand=True
is effectively only meaningful when the job is already running. For any other state it does not make sense.
According to WorkloadManagementSystem/Client/JobStatus.py,
the states that can go to Killed are
The current code does not kill jobs in 'WAITING' state
killJobList.append(jobID)
deleteJobList.append(jobID)
markKilledJobList.append(jobID)
Do we want
a) to treat them all the same, eg. killJobList.append(jobID)
,
for sendKillCommand=True or False doesn't matter for jobs not running,
or
b) to do as follows?
killJobList.append(jobID)
markKilledJobList.append(jobID)
If the latter (b), then we would need to keep the hard-coding.
I would go for option "a".
I would go for option "a".
Sorry for the delay @fstagni @iueda . I am back on this issue. Now the new proposal uses filterJobStateTransition()
to check if the job can go to killed
, plus deleted
if requested.
Also, all jobs to be killed (after the filtering) go to killJobList
, it means, no usage of sendKillCommand=False
I took the freedom to push for adding a unit test.
I took the freedom to push for adding a unit test.
Thanks a lot!
Sweep summary
Sweep ran in https://github.com/DIRACGrid/DIRAC/actions/runs/10367347752
integration cherry-pick 8715e1e0e into integration failed check merge conflicts on a local copy of this repository
git fetch upstream
git checkout upstream/integration -b cherry-pick-2-8715e1e0e-integration
git cherry-pick -x -m 1 8715e1e0e
# Fix the conflicts
git cherry-pick --continue
git commit --amend -m 'sweep: #7690 Proper handling of waiting jobs when set to be killed' --author=''
git push -u origin cherry-pick-2-8715e1e0e-integration
# If you have the GitHub CLI installed the PR can be made with
gh pr create \
--label 'sweep:from rel-v8r0' \
--base integration \
--repo DIRACGrid/DIRAC \
--title '[sweep:integration] Proper handling of waiting jobs when set to be killed' \
--body 'Sweep #7690 `Proper handling of waiting jobs when set to be killed` to `integration`.
Adding original author @michmx as watcher.
BEGINRELEASENOTES
*WMS
FIX: Proper killing of jobs when not matched, running or stalled
ENDRELEASENOTES
Closes #7749'
When jobs are in status SUBMITTING, WAITING, etc, and they are set to be killed, they are not added to the list
markKilledJobList
. This pull request fix the identation whenkill
instead ofdelete
is used.BEGINRELEASENOTES
*WMS FIX: Proper killing of jobs when not matched, running or stalled
ENDRELEASENOTES