queue_get_next() item retrieval performance upgrade

Description

Previously , queue_get_next() used ordering by CASE expressions for tag ordering, which reduced performance and also hindered the use of any index on column orderings. The new change queries the database for each tag individually, in the order of the tags input variable provided, reducing the number of returned items from the user input limit, before doing the same for the next tag. A new file under benchmarks/ was also created for record creation and queue_get_next invocation.

Changelog description

Changing queue_get_next() for better index usage and faster item retreival

Results on Retrieval

Below are some benchmark resulted extracted straightly from EXPLAIN ANLAYZE command of postgresql. The database has the following setup: 14000 records, of which 13000 have an error status, and 1000 records with a single program p1 with nearly even split of two tags. The tests are done for two tags [tag1, tag2] and program p1 with limit=1000

Before

SELECT task_queue.id, task_queue.spec, task_queue.tag, task_queue.parser, task_queue.program, task_queue.procedure, task_queue.status, task_queue.priority, task_queue.manager, task_queue.error, task_queue.created_on, task_queue.modified_on, task_queue.base_result_id 
FROM task_queue 
WHERE task_queue.status = 'waiting' AND task_queue.program = 'p1' AND
task_queue.tag IN ('tag1', 'tag2') AND (1 != 1 OR task_queue.procedure IS NULL)
ORDER BY CASE WHEN (task_queue.tag = 'tag1') THEN 0 WHEN (task_queue.tag = 'tag2') THEN 1 END,
task_queue.priority DESC, task_queue.created_on 
LIMIT 1000

runtime = 6.699 ms

New change

Two queries for two tags:

SELECT task_queue.id, task_queue.spec, task_queue.tag, task_queue.parser, task_queue.program, task_queue.procedure, task_queue.status, task_queue.priority, task_queue.manager, task_queue.error, task_queue.created_on, task_queue.modified_on, task_queue.base_result_id
FROM task_queue 
WHERE task_queue.status = 'waiting' AND task_queue.program = 'p1' AND task_queue.tag = 'tag1'
AND (1 != 1 OR task_queue.procedure IS NULL) 
ORDER BY task_queue.priority DESC, task_queue.created_on 
LIMIT 1000;

runtime = 0.668 ms

SELECT task_queue.id, task_queue.spec, task_queue.tag, task_queue.parser, task_queue.program, task_queue.procedure, task_queue.status, task_queue.priority, task_queue.manager, task_queue.error, task_queue.created_on, task_queue.modified_on, task_queue.base_result_id 
FROM task_queue 
WHERE task_queue.status = 'waiting' AND task_queue.program = 'p1' 
AND task_queue.tag = 'tag2' AND (1 != 1 OR task_queue.procedure IS NULL) 
ORDER BY task_queue.priority DESC, task_queue.created_on 
LIMIT 507;

runtime = 1.919 ms which is additively equal to ~2.6 ms.

Results on function

Below are the results of a number of experiments brought in the table with different database population, with different waiting record ratio across some experiments.

Total Items	No. of Waiting	Tags	Master Branch	This Branch	Query Number
14000	1000	~500 `t1`, ~500 `t2`	0.3025	0.3134	1000
95000	1000	~500 `t1`, ~500 `t2`	0.3240	0.3319	1000
95000	3000	~1500 `t1`, ~1500 `t2`	0.6561	0.3460	1000
95000	3000	~1500 `t1`, ~1500 `t2`	0.9451	0.9532	3000
2M	100k	~12k `t1`, ~12k `t2`, rest other tags	0.4017	0.0083	1000

The branch numbers are in seconds.

As it can be seen, the method outperforms the master branch in cases where the number of returned tags is a subset of input tags(or in other words, in master branch, the number of queried and selected tags). In cases where all tags are returned, our method performs worse by a margin of ~8-10 milliseconds. The last line is an extreme production level simulation where the difference in performance is pretty significant.

Important: EXPLAIN ANALYZE did not show index usage for sorting. On a much bigger database, with nearly 2 million records, with 100k status=waiting items, the index is used in sorting, with parallel workers. More investigation is required for identifying a threshold on index usage in terms of database records.

Important: This change requires a new composite index on priority DESC and created_on. The change was done manually in the sql engine, alembic change will be added in the future.

Status

[X] Code base linted
[X] Add function total time comparison, not only limited to item lookup.
[x] Add alembic migration file for the new index.
[ ] Ready to go

MolSSI / QCFractal