OCA / queue

Asynchronous Job Queue
GNU Affero General Public License v3.0
179 stars 452 forks source link

[FIX] queue_job: Commit was removed, #507

Closed etobella closed 1 year ago

etobella commented 1 year ago

That could be a problem if you generated too much records

removed on #315

Needs to be ported to 15.0 and 16.0 too. CI was fixed on 13.0

OCA-git-bot commented 1 year ago

Hi @guewen, some modules you are maintaining are being modified, check this out!

etobella commented 1 year ago

@florian-dacosta

pedrobaeza commented 1 year ago

Put a limit on the search

florian-dacosta commented 1 year ago

This is not correct. You should never commit in any regular transaction. Why is it a problem?

It was like this before and I was told it was ok to commit this way because a new cursor is already opened for a cron and the cron only delete stuff, so it seems fine to commit after the unlink : https://github.com/OCA/queue/pull/313#issuecomment-776698832

On instance that creates a lot of job records each day, you may ran into timeout issues with this cron and without commit, it will just be stuck forever. With a commit of the deleted jobs, there are good chance the cron run smootly when it launches again. https://github.com/OCA/queue/pull/313

It seems to me that the only point to make batch of job to delete is to be able to commit from time to time. Deleting job in batch of 1000 without commit seems pointless. For me this patch is important

pedrobaeza commented 1 year ago

It's true that a new cursor is opened for a cron, but:

  1. You are depending on the internal implementation details. Not sure in a threaded server (low Odoo.sh instances - although we know other issues for this module, but as an example -), it happens the same.
  2. Also don't know if other crons have been executed before.

Thus, a commitment solution is to put a limit (even configurable through a system parameter if you prefer), to restrict the number of jobs to clean.

florian-dacosta commented 1 year ago
2. Also don't know if other crons have been executed before.

But each cron has its own new cursor, so I don't think it is an issue if other crons have been executed before, not sure I understand your point.

Thus, a commitment solution is to put a limit (even configurable through a system parameter if you prefer), to restrict the number of jobs to clean.

Actually, the cron runs once a day, and its goal is to delete every job which are 1 month old. If we put a limit to the cron, it will only delete a part of the jobs and the number to delete will be increasing each day... Unless you run the job more often and try to configure a limit that match the number of job created each day divided by the time the cron run each day, but it seems a lot of work for nothing !

If opening one new cursor for the seach/unlink part makes it ok, I guess it can be done this way...But I really believe that committing is important for some project, and I really don't see where it would be an issue for this cron.

florian-dacosta commented 1 year ago

And, it was added in a PR, respecting all the possible rules, approved by 2 PSC and then kind of silently removed in a PR which had no link with this feature (migration of another module), I think it is not the way to go...

etobella commented 1 year ago

there is already a limit of 1000 jobs, the problem is that if you have to process 100.000 jobs, it will get stuck forever if, by some reason, it fails, as no records are deleted and it will restart from the begining. We might try to make 1000 just for each channel, but the same error might happen if you have a lot of channels.

If that's a problem, we might try to make it using a specific transaction like it is done on https://github.com/OCA/server-tools/blob/14.0/autovacuum_message_attachment/models/autovacuum_mixin.py#L17-L35 :thinking:

pedrobaeza commented 1 year ago

The problem is the loop itself. Each pass of the cron should just unlink a limit of jobs. Remove the while True part, and put a more frequent cron execution. Or as said, put a configurable limit by system parameter. There are a lot of options that are not the commit one.

etobella commented 1 year ago

IMO there was nothing wrong on the commit. Odoo applies the same logic on several crons

https://github.com/odoo/odoo/blob/f5ffcf7feec5526a483f8ddd240648c084351008/addons/crm/models/crm_lead.py#L2278 https://github.com/odoo/odoo/blob/fa58938b3e2477f0db22cc31d4f5e6b5024f478b/addons/crm_iap_enrich/models/crm_lead.py#L117 https://github.com/odoo/odoo/blob/439bef710b78afe0e5ba1d54ce04a5e5c3906cdc/addons/project/models/project.py#L980-L996 https://github.com/odoo/odoo/blob/5c12edb989a044d35e8be93b40fdc0e3a66379ed/odoo/addons/base/models/ir_attachment.py#L156-L181 https://github.com/odoo/odoo/blob/dca1d6813fa5fd7cd322c4f6e489eb8a10e6691a/addons/stock/models/stock_rule.py#L536

Also, it was already merged but deleted by someone on an unrelated PR.

pedrobaeza commented 1 year ago

Note all the comments and tricks like invalidate_model that all your references have, so IMO you should still stick to the standard to avoid side problems.

etobella commented 1 year ago

Just the first one has that trick.

pedrobaeza commented 1 year ago

And other needs to avoid the test mode, other makes a previous LOCK SQL execute, etc. As said, this is full of possible side effects.

etobella commented 1 year ago

the test can be avoided using the same logic applied on 15 and 16. It was not done because this code was not tested on GH Actions, actually it should be a good practice to add the test verification :smile:

pedrobaeza commented 1 year ago

Don't insist: it's a bad practice, and it can be circumvented on the ways I have already commented. Any other thing not done on this way will lead in a future to a weird behavior. It's very easy to you to fix your problem: remove the while True sentence in this PR and increase the cron frequency.

hegenator commented 1 year ago

I also have this same problem, and I'd like to get it resolved as soon as possible since it just gets worse over time.

I see the discussion stopped two weeks ago. @etobella are you still working on this and planning to incorporate the types of changes Pedro suggested? If you have an alternative solution already in use or simply don't have the time to work on this, I will test the other suggested solutions instead and create a new pull request with those changes.

guewen commented 1 year ago

Note all the comments and tricks like invalidate_model that all your references have, so IMO you should still stick to the standard to avoid side problems.

They do this because they execute direct SQL, it is unrelated to the commit. A commit in a cron is safe.

guewen commented 1 year ago

/ocabot merge patch

OCA-git-bot commented 1 year ago

This PR looks fantastic, let's merge it! Prepared branch 14.0-ocabot-merge-pr-507-by-guewen-bump-patch, awaiting test results.

OCA-git-bot commented 1 year ago

Congratulations, your PR was merged at 1ab8642d8ee09f20480664e9607ccafd561e5112. Thanks a lot for contributing to OCA. ❤️