MolSSI / QCFractal

A distributed compute and database platform for quantum chemistry.
https://molssi.github.io/QCFractal/
BSD 3-Clause "New" or "Revised" License
143 stars 47 forks source link

Improve task queue claim query #810

Closed bennybp closed 3 months ago

bennybp commented 3 months ago

Description

We have noticed some very slow times for managers claiming tasks when there is a large number of managers. This is (partly) due to an inefficient query, where there is a LEFT OUTER JOIN from the task_queue table (millions of rows) with a CTE.

This changes that query by moving the LEFT OUTER JOINs into the CTE, then doing an INNER JOIN with the CTE. We can also remove the duplicate check for the waiting status.

This also adds a sort by task_queue.id to make things more deterministic.

Tests of the query on the production database show a speedup of about 21x (from 23 seconds to 1.1 seconds)

Changelog description

Improve task claim performance with a more efficient query

Status

bennybp commented 3 months ago

Sneaking this change into production does show a significant improvement

           timestamp           |        full_uri         |  request_duration  
-------------------------------+-------------------------+--------------------
 2024-04-04 13:59:56.903021+00 | /compute/v1/tasks/claim | 0.7139768600463867
 2024-04-04 13:59:56.180234+00 | /compute/v1/tasks/claim | 0.7162907123565674
 2024-04-04 13:59:54.313889+00 | /compute/v1/tasks/claim | 0.7804467678070068
 2024-04-04 13:59:53.681079+00 | /compute/v1/tasks/claim |  0.721970796585083
 2024-04-04 13:59:51.46731+00  | /compute/v1/tasks/claim | 0.7714815139770508
 2024-04-04 13:59:50.994263+00 | /compute/v1/tasks/claim |  0.799065351486206
 2024-04-04 13:59:25.614467+00 | /compute/v1/tasks/claim | 20.008413791656494
 2024-04-04 13:59:18.918665+00 | /compute/v1/tasks/claim | 0.7871594429016113
 2024-04-04 13:59:11.720316+00 | /compute/v1/tasks/claim | 0.8088154792785645
 2024-04-04 13:59:05.498238+00 | /compute/v1/tasks/claim |  0.896820068359375
 2024-04-04 13:59:01.658589+00 | /compute/v1/tasks/claim | 20.022178888320923
 2024-04-04 13:58:41.234799+00 | /compute/v1/tasks/claim | 20.130000352859497
 2024-04-04 13:58:39.281633+00 | /compute/v1/tasks/claim | 20.830238103866577
 2024-04-04 13:58:24.597782+00 | /compute/v1/tasks/claim |  20.07417392730713
 2024-04-04 13:58:23.663994+00 | /compute/v1/tasks/claim |  20.27929925918579

700ms is still a bit slow in my opinion. Still thinking if there's a better solution

bennybp commented 3 months ago

Superseded by #811