HERA-Team / librarian

The HERA Librarian.
BSD 2-Clause "Simplified" License
6 stars 11 forks source link

Standing-order queries very hard on database server #20

Closed pkgw closed 7 years ago

pkgw commented 8 years ago

At the moment, the DB queries that we run to check out the "standing orders" seem to be very hard on the database server, pegging its CPU at 100% for several minutes. Here's the kind of query that seems to be the culprit:

hera_lib=> SELECT pid, datname, usename, query FROM pg_stat_activity;
 pid  | datname  | usename |                                                                                              query                                                                                              
-------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
27921 | hera_lib | hera    | SELECT file.name AS file_name, file.type AS file_type, file.create_time AS file_create_time, file.obsid AS file_obsid, file.size AS file_size, file.md5 AS file_md5, file.source AS file_source +
      |          |         | FROM file                                                                                                                                                                                       +
      |          |         | WHERE file.name LIKE '%HH.uvc' AND file.name NOT IN (SELECT file.name AS file_name                                                                                                              +
      |          |         | FROM file_event                                                                                                                                                                                 +
      |          |         | WHERE file_event.name = file.name AND file_event.type = 'standing_order_succeeded:test-small-subarrays-to-shredder')
20801 | hera_lib | hera    | SELECT pid, datname, usename, query FROM pg_stat_activity;
(2 rows)

This should be optimized.

pkgw commented 8 years ago

FWIW: If I add clauses to the search terms to limit the selections to only relatively recent files, the queries are much easier on the server. But it'd be nice for arbitrary user queries to not run the risk of hammering the server like this ... especially since they currently cause the Librarian to lock up and be unable to address any other operations while it's running the query.

pkgw commented 7 years ago

Fixed in 5ed1ba5b — just needed to add an index on FileEvent.name and everything is much, much faster. And alembic made the upgrade of the live system super straightforward.