OCA / queue

Asynchronous Job Queue
GNU Affero General Public License v3.0
179 stars 456 forks source link

[FIX] queue_job: max retry #622

Open oerp-odoo opened 8 months ago

oerp-odoo commented 8 months ago

When job fails because of concurrent update error, it does not respect max retries set by the job. Problem is that perform method logic that handles re-try is never called, because runjob in controller that triggers jobs, catches expected exception and silences it. Though it is done to not pollute logs.

So for now, adding extra check before job is run, to make sure max retries are handled if it reached it.

Some context:

It looks like code that supposed to handle max retries, is never called. But I am not sure what would be the right way to trigger up exception as there is some logic in here https://github.com/OCA/queue/blob/e2c6bab9ebed9bf9f8d2110c79205a440da67327/queue_job/controllers/main.py#L125 that explicitly not want to raise that exception.

Not having max retries can be very problematic if your jobs can have many concurrent updates. Had some issue where somehow same job record (yes job record itself, not some other records, job would update) was being updated by two job runners at the same time and it would always fail and re-try. It had over 400 re-tries. And the only way to stop it, was to restart odoo.

For example, without this fix we can end up in situation like this:

Selection_1050

OCA-git-bot commented 8 months ago

Hi @guewen, some modules you are maintaining are being modified, check this out!

github-actions[bot] commented 4 months ago

There hasn't been any activity on this pull request in the past 4 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days. If you want this PR to never become stale, please ask a PSC member to apply the "no stale" label.

oerp-odoo commented 4 months ago

@guewen can you check this?