OCA / queue

Asynchronous Job Queue
GNU Affero General Public License v3.0
179 stars 452 forks source link

[15.0][IMP] queue_job: Vacuum less jobs more often to avoid timeouts #525

Closed hegenator closed 1 year ago

hegenator commented 1 year ago

In databases where lots of jobs are created it is possible for enough jobs to be generated in short enough time that eventually the autovacuum starts to time out. Since there is no cursor commit between unlinks, the timeout will also lead to the rollback of all the unlinks done so far, leading to no jobs being deleted. After the cron has timed out once, it is very likely that it will keep timing out as more jobs accumulate in the queue.

By running the cron more often but removing less jobs at once, the timeouts can be avoided without adding cursor commits.

For backwards compatibility, the limit per channel defaults to None so in databases where the cron already exists with the old execution interval the functionality remains the same.

OCA-git-bot commented 1 year ago

Hi @guewen, some modules you are maintaining are being modified, check this out!

hegenator commented 1 year ago

This change is motivated by the discussion on https://github.com/OCA/queue/pull/507 and as an alternative to the changes on it.

Since most of the change requests were by you @pedrobaeza , could you please check this one out and give your opinions on if this is closer to what you had in mind?

The things I'm most unsure about and would appreciate feedback and discussion on:

  1. The backwards compatibility. Now the default limit per cron run is unlimited, so the functionality should remain consistent with the previous versions. I also considered a migration script to update the cron, but figured that might be too risky since it is noupdate="1" on purpose, and there might be changes to it that we can't know about and don't want to override.
  2. The default execution interval and limits.
guewen commented 1 year ago

@pedrobaeza But then the cleaning would be executed all day long, whereas it could be done nightly when the job activity is sparser, with less impact on jobs/users activity, no?

pedrobaeza commented 1 year ago

The idea is to have a sane default, and the Odoo paradigm about this is clear: all operations should be short in time. And as it is, installing the module will program the cron once a day, but the start time is set on the installation time, not on the night, so you are not getting the desired effect. And there are a lot of installations that will stop the system for a backup on night, or to have 24/7 of activity.

Having multiple threads/workers, execute in parallel a light deletion of 1000 records has little to none impact. And if your system is special (on number of records, usage, etc), you can always tweak crons for complying with your needs (the same as before). A section can be put though on CONFIGURE.rst to reflect about this.

guewen commented 1 year ago

But then, if the goal is to have sane defaults, 24'000 jobs / day (1000 jobs * 24 hours) (maybe more depending on the channels though) is a low number of jobs. It doesn't seem a sane default to me, as soon as you overshoot this point, jobs will accumulate, breaking the principle of least surprise (if you don't know, you will not necessarily discover the problem).

BTW, I think we have no index that optimizes this query, I reckon it should be added, especially if we execute the search often.

Also, thank you @hegenator to tackle this point.

guewen commented 1 year ago

stop the system for a backup on night

Who does that 😳

pedrobaeza commented 1 year ago

Who does that

You would be surprised with the techniques we have found before getting into our hands... Some of them stop the VM for doing a snapshot for example.

About the debate itself, 24k is a good reasonable high number, and this can be stated in the README in the known issues section. You can't fit all installation sizes, and a medium-big size installation will require more than this tweak for sure.

guewen commented 1 year ago

You can't fit all installation sizes

With this kind of crons, you can't. With the kind of cron that runs and commit every X items until all the work is done, you can.

I read the whole thread on https://github.com/OCA/queue/pull/507 and I don't get the problem you are trying to fix by avoiding commit in a cron. I have never had any issue with a commit in a cron, the cron mechanism opens a new transaction. And as others said odoo themselves use it.

It looks like to me that you are creating more potential problems with your request not to loop in the cron job (I mean if we have to add a "known issue", we are just creating an issue).

pedrobaeza commented 1 year ago

OK, I don't resist more. I still think it's not a good idea, but if you have clear that there are no side effects, then go with the other option.

hegenator commented 1 year ago

Thank you both for taking the time to assist me on this and continuing the discussion. As it seems now that a consensus has been reached, I will close this pull request as obsolete, and look forward to #507 (and

508 and #509) moving forwards.

guewen commented 1 year ago

Thank you both :green_heart: