OCA / queue

Asynchronous Job Queue
GNU Affero General Public License v3.0
175 stars 451 forks source link

[IMP] Multi-node high-availability jobrunner #607

Closed PCatinean closed 2 months ago

PCatinean commented 8 months ago

This PR introduces support for multi-node deployments in high availability. As a bonus this also works on odoo.sh because of the similar deployment architecture (separate and independent odoo processes connecting to the same database).

The HA logic only kicks in when the high_availability option is enabled to prevent regression and also spare redundant queries and computations in "standard" mode.

The approach taken here leverages the pg_stat_activity table to employ a "leadership election" algorithm using the common piece for all odoo processes, the psql database. The beauty of this design is that it delegates the leadership election to the database and it does not require a "heartbeat" check or custom communication between the jobrunner processes. Whenever an active jobrunner connection is broken (for whatever reason) the pg_stat_activity table will be updated immediately and the common logic ensures all jobrunners elect the same leader. Another advantage to this approach is that it is also compatible with odoo.sh which does not support advisory locks since it breaks their database backup/replication operations.

Multiple databases are also supported with this PR with one important note. Since the leadership election system considers one leader per the entire database set (jobrunner A for DB1 and DB2 let's say) if someone would start another jobrunner process with an overlapping set (jobrunner B for DB2 and DB3) then only one jobrunner will be elected as leader meaning if jobrunner A is selected DB3 will be ignored and jobrunner B is selected DB1 will be ignored.

This PR has been tested in the following environments:

  1. Locally with one jobrunner and one database (with high_availability and without)
  2. Locally with one jobrunner and several databases (with high_availability and without)
  3. Locally with several jobrunners (odoo replicas) and one database (with high_availability)
  4. Locally with several jobrunners (odoo replicas) and several databases (with high_availability)
  5. On odoo.sh with one odoo.sh worker + cron worker
  6. On odoo.sh with two odoo.sh workers + cron worker

Testing in a kuberentes cluster remains but I don't imagine the results will be any different from the local ones.

In all tests we introduced process disruptions (docker kill container / kill -9 odoo-process-pid) and the failover mechanism kicked in quickly in all instances and there was always one leader active processing the queue.

In regards to odoo.sh, ideally the jobrunner should not run on the cron worker (which spawns and restarts very frequently) but even if this does happen (which it did in our tests) it will take over leadership briefly and processes jobs until it's removed and then a regular http worker will take leadership. I am not sure if we should introduce specific odoo.sh logic in here or just in a separate module, your input here is welcome.

As always a massive thank you to @bizzappdev for helping develop, test and debug this entire PR :pray:

I am eagerly waiting for the community feedback and testing of this feature.

OCA-git-bot commented 8 months ago

Hi @guewen, some modules you are maintaining are being modified, check this out!

guewen commented 8 months ago

Hey @PCatinean thanks! Crazy week here, I'll try to take some time next week

guewen commented 8 months ago

I can already say, overall it looks good, and I'm fine with the fact that a first version is global to DB cluster as it needs to be explicitely enabled. Then an improvement for another PR might be to use a custom key which can actually be a static key configured in the env/config file which is more straightforward and flexible than trying to compute a hash from the databases.

luke-stdev001 commented 8 months ago

@PCatinean ,

This looks like a very interesting PR. I would be happy to spin up a replica of our production GKE cluster to put this through it's paces and give you access to test this at scale. We'll have a use-case for it once the second phase of our project kicks off, so we'd love to be your guinea pig.

PCatinean commented 8 months ago

Hi @guewen

Thank a lot for the review, I'm really happy to see this moving forward.

On the topic of DB cluster I am still not sure if I am missing something, here's a rundown of how I see things and how it works in my mind:

Say we have 3 databases in the cluster all with queue_job installed: DB1, DB2, DB3.

  1. When a jobrunner spaws alongside an odoo proccess it will get a unique UUID every time.
  2. If the jobrunner does not have any databases explicitly configured it will be global for the db cluster (DB1, DB2, DB3), standard functionality of queue_job.
  3. If high_availability is set then we would have as many jobrunners (with unique ids) connected to the db cluster as available nodes or odoo procceses.
  4. All these separate jobrunners will fight for the leadership of all DBs set and only one will win. If the leader should disconnect the next one will take over.
  5. If instead we explicitly configure the odoo/jobrunner to run on a specific set of databases (DB1,DB2) then the same will happen as in points 3 and 4 but only for those specific databases and not others.
  6. One can configure another jobrunner to run on DB3 (and more others) and it should not clash with the first jobrunner.
  7. The only issue comes when one configures two separate jobrunners with intersecting/common databases (Jobrunner A with DB1 & DB2 + Jobrunner B with DB2 + DB3). This would mean that the jobrunner will run its set but the second one will not run anything. This would also be a problem in standard mode as well as you would have DB2 being operated by two paralel. jobrunners

Long story short if we have multiple jobrunners in HA designated for a specific set of databases (explicit or implicit) it should not encounter problems outside of them if they all have the same config file.

Hope that brings a bit more context and my logic is not too convoluted :smile:

guewen commented 8 months ago

@PCatinean Oh, right! I missed this bit datname IN %s which acts as a key for the set of databases.

As for the intersecting databases, well, trying to achieve this setup would be shooting oneself in the foot (right?) so a warning is good enough.

Thanks for the clarification

PCatinean commented 8 months ago

That's what we also said, if one does this then it's a clear misconfiguration and also thought about a warning message in the leadership check, we can implement that as well.

The advisory lock approach could be a bit more stable/robust than this and I would have no problem forwarding that if we can get it to work in all cases especially on odoo.sh. I only mentioned the db replication issues because the odoo.sh developers mentioned it in the PR, did not test it myself :man_shrugging:

pfranck commented 5 months ago

@guewen - I'm testing the MR seems ok for me - Can we see how to merge that to 16.0

guewen commented 5 months ago

I was hoping to have other reviews on this important one, cc @sbidoul @StefanRijnhart @simahawk @lmignon

sbidoul commented 5 months ago

Hi @guewen I will review soon. This is cool @PCatinean .

From a distance, my first thought is about race conditions in the leader election, though. I am under the impression a session level advisory lock would be more robust.

PCatinean commented 5 months ago

Hi @sbidoul thanks for the feedback. I too would lean towards a session level lock and would have no issue to drop this PR in favor of that approach as it is indeed more robust as you said. My only concern is that for odoo.sh one engineer said it tampered with their db replication. From what I read neither type of advisory lock "session or transaction" should affect replication so I am not sure how compatible it would be with odoo.sh.

This approach however has been in production with us for many months and seems to have no issues. Not sure if we can ask an odoo.sh engineer for feedback?

sbidoul commented 5 months ago

This approach however has been in production with us for many months and seems to have no issues. Not sure if we can ask an odoo.sh engineer for feedback?

If you have a contact in the odoo.sh team, it would be great to get their advice on session level advisory locks, indeed.

PCatinean commented 4 months ago

@sbidoul I tried asking for feedback from the odoo.sh team on the initial PR which raised their concern but it seems no feedback.

In the meantime I made the requested changes, decision is on the community on what approach to take. I personally wouldn't risk the advisory lock strategy though without their approval on odoo.sh.

Another idea would be to have different leader election algorithms in config file "advisory_lock vs query" or something but I am not sure if this adds too much complexity.

LoisRForgeFlow commented 3 months ago

Hi @PCatinean

What is the status of this proposal? Have you finally switched to a different a approach or is this one good enough?

Thanks!

PCatinean commented 3 months ago

Hi @LoisRForgeFlow

There was no reply from the the odoo.sh team so unfortunately I still do not know if they will support the more robust advisory lock approach.

As such we don't know how to proceed but for now it seems this is the only tested version (in production with me since september last year with no issues).

If someone is willing to risk testing the advisory lock approach in production and wait to see if they get contacted by odoo.sh team for potential problems then we might know but I would not risk this in my production instances.

This approach has the advantage of not causing any issues because it's just a query every X seconds so it cannot be rejected or cause problems.

The only middle ground I can think of it to also introduce a "Leader Election Algo" and have this as one. We deploy it like this and if odoo.sh turns out to be fine on advisory lock we can just phase it out in the next version or just leave it as an option and put advisory lock as default...

LoisRForgeFlow commented 3 months ago

@PCatinean

Thanks for the update. From my side, we have been using this PR for a while in an odoo.sh instance and it works. However it seems that since then we have a bit of performance issues, and the CPU load is quite high even when no jobs are to be run. Not saying it is the fault of this approach as I haven't yet done serious investigation, but wanted to comment just in case you have experienced something similar.

PCatinean commented 3 months ago

@LoisRForgeFlow we have not, after deploying this and adding one more worker the performance has been great. This should not consume anything as it is just a simple process querying every 10 seconds.

You can do a comparison of the platform monitoring with the module added and removed and compare but I doubt there will be a difference.

LoisRForgeFlow commented 3 months ago

@PCatinean Thanks for your insights, I just want to confirm that you were correct, the problem came from other issues, not by the runner itself.

PCatinean commented 2 months ago

So after some feedback from the odoo.sh engineers it seems that manipulating the pg_application_name should not be done and will most likely be restricted in the future. Even though the jobrunner connection is changing the pg_application_name, not the main odoo worker, the future constraints might break this solution and/or cause the module to suddenly violate the terms of service.

As such I think this is risky to deploy in production and will close this PR. We can proceed with the advisory lock approach for multi-node workers in a separate PR @sbidoul but I would list odoo.sh as not compatible for the reason provided by the odoo.sh engineers in the original PR for V13.

sbidoul commented 2 months ago

Thanks for the feedback (and your efforts) @PCatinean.

For reference, the original comment from odoo.sh team is https://github.com/OCA/queue/pull/256#issuecomment-895111832

Now, as @guewen mentioned above, he was using a transaction level advisory lock and that was causing the issues mentioned.

If we use session level advisory locks, we may not hit the same limitations. And since we already need a long lived connection (not transaction) to each db for LISTEN, we could maybe acquire a session level advisory lock on each db just before doing the LISTEN, provided each runner attempts to connect/lock/LISTEN to all the databases in the same order. That could even be a small change to the code, not sure.

sbidoul commented 2 months ago

Something like #668, maybe