OCA / queue

Asynchronous Job Queue
GNU Affero General Public License v3.0
179 stars 456 forks source link

The job runner does not understand cancelled jobs #531

Open sbidoul opened 1 year ago

sbidoul commented 1 year ago

Module

queue_job

Describe the bug

Since https://github.com/OCA/queue/pull/347, jobs have a cancelled state. However the job runner does not understand this state.

https://github.com/OCA/queue/blob/0bf3a1a48100fce68f3ece7aca1eee39001e59bb/queue_job/jobrunner/channels.py#L1048-L1061

This may lead to cancelled job filling up channels forever.

sbidoul commented 1 year ago

@hbrunn @hparfr as you seem to use this feature, have you not noticed problems with it?

I still need to investigate in more depth but I'm under the impression that cancelled jobs fill up channels until the job runner is restarted.

hparfr commented 1 year ago

I didn't pay attention to this one. I have canceled jobs with an error message in one of my db :roll_eyes:

sbidoul commented 1 year ago

Looking at places where DONE is used I notice the job dependencies mechanism. If I read correctly cancelled jobs cause their dependencies to stay WAIT_FOR_DEPENDENCIES until they are themselves cancelled or marked done. I guess that makes sense?

sbidoul commented 1 year ago

I propose a fix for the jobrunner in #533

sbidoul commented 1 year ago

@guewen if you have a moment, I'll welcome your advice on the effect of the cancelled states on the job dependencies mechanism. Maybe it's fine but I'm not entirely sure it was taken into account when porting the cancelled state feature from 12 to 14.

guewen commented 1 year ago

Hmm yeah I'm not sure the current situation is alright. Cancelling a parent job leaves a situation where dependent jobs will never be executed I think. We have to cancel them or set them to done, which is ultimately the same. I wonder if cancelling a job in a graph shouldn't cancel all the dependent jobs or maybe even all the graph's jobs (this can be explained in the cancel wizard)?

github-actions[bot] commented 11 months ago

There hasn't been any activity on this issue in the past 6 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days. If you want this issue to never become stale, please ask a PSC member to apply the "no stale" label.