MolSSI / QCFractal

A distributed compute and database platform for quantum chemistry.
https://molssi.github.io/QCFractal/
BSD 3-Clause "New" or "Revised" License
148 stars 48 forks source link

queue_get_next() item retrieval performance upgrade #592

Closed codtiger closed 4 years ago

codtiger commented 4 years ago

Description

Previously , queue_get_next() used ordering by CASE expressions for tag ordering, which reduced performance and also hindered the use of any index on column orderings. The new change queries the database for each tag individually, in the order of the tags input variable provided, reducing the number of returned items from the user input limit, before doing the same for the next tag. A new file under benchmarks/ was also created for record creation and queue_get_next invocation.

Changelog description

Changing queue_get_next() for better index usage and faster item retreival

Results on Retrieval

Below are some benchmark resulted extracted straightly from EXPLAIN ANLAYZE command of postgresql. The database has the following setup: 14000 records, of which 13000 have an error status, and 1000 records with a single program p1 with nearly even split of two tags. The tests are done for two tags [tag1, tag2] and program p1 with limit=1000

Before

SELECT task_queue.id, task_queue.spec, task_queue.tag, task_queue.parser, task_queue.program, task_queue.procedure, task_queue.status, task_queue.priority, task_queue.manager, task_queue.error, task_queue.created_on, task_queue.modified_on, task_queue.base_result_id 
FROM task_queue 
WHERE task_queue.status = 'waiting' AND task_queue.program = 'p1' AND
task_queue.tag IN ('tag1', 'tag2') AND (1 != 1 OR task_queue.procedure IS NULL)
ORDER BY CASE WHEN (task_queue.tag = 'tag1') THEN 0 WHEN (task_queue.tag = 'tag2') THEN 1 END,
task_queue.priority DESC, task_queue.created_on 
LIMIT 1000

runtime = 6.699 ms

New change

Two queries for two tags:

SELECT task_queue.id, task_queue.spec, task_queue.tag, task_queue.parser, task_queue.program, task_queue.procedure, task_queue.status, task_queue.priority, task_queue.manager, task_queue.error, task_queue.created_on, task_queue.modified_on, task_queue.base_result_id
FROM task_queue 
WHERE task_queue.status = 'waiting' AND task_queue.program = 'p1' AND task_queue.tag = 'tag1'
AND (1 != 1 OR task_queue.procedure IS NULL) 
ORDER BY task_queue.priority DESC, task_queue.created_on 
LIMIT 1000;

runtime = 0.668 ms

SELECT task_queue.id, task_queue.spec, task_queue.tag, task_queue.parser, task_queue.program, task_queue.procedure, task_queue.status, task_queue.priority, task_queue.manager, task_queue.error, task_queue.created_on, task_queue.modified_on, task_queue.base_result_id 
FROM task_queue 
WHERE task_queue.status = 'waiting' AND task_queue.program = 'p1' 
AND task_queue.tag = 'tag2' AND (1 != 1 OR task_queue.procedure IS NULL) 
ORDER BY task_queue.priority DESC, task_queue.created_on 
LIMIT 507;

runtime = 1.919 ms which is additively equal to ~2.6 ms.

Results on function

Below are the results of a number of experiments brought in the table with different database population, with different waiting record ratio across some experiments.

Total Items No. of Waiting Tags Master Branch This Branch Query Number
14000 1000 ~500 t1, ~500 t2 0.3025 0.3134 1000
95000 1000 ~500 t1, ~500 t2 0.3240 0.3319 1000
95000 3000 ~1500 t1, ~1500 t2 0.6561 0.3460 1000
95000 3000 ~1500 t1, ~1500 t2 0.9451 0.9532 3000
2M 100k ~12k t1, ~12k t2, rest other tags 0.4017 0.0083 1000

The branch numbers are in seconds.

As it can be seen, the method outperforms the master branch in cases where the number of returned tags is a subset of input tags(or in other words, in master branch, the number of queried and selected tags). In cases where all tags are returned, our method performs worse by a margin of ~8-10 milliseconds. The last line is an extreme production level simulation where the difference in performance is pretty significant.


Important: EXPLAIN ANALYZE did not show index usage for sorting. On a much bigger database, with nearly 2 million records, with 100k status=waiting items, the index is used in sorting, with parallel workers. More investigation is required for identifying a threshold on index usage in terms of database records.

Important: This change requires a new composite index on priority DESC and created_on. The change was done manually in the sql engine, alembic change will be added in the future.

Status

codecov[bot] commented 4 years ago

Codecov Report

Merging #592 into master will increase coverage by 0.01%. The diff coverage is 100.00%.