distributed-system-analysis / pbench

A benchmarking and performance analysis framework
http://distributed-system-analysis.github.io/pbench/
GNU General Public License v3.0
186 stars 108 forks source link

Remove IndexMap document list #3606

Closed dbutenhof closed 8 months ago

dbutenhof commented 8 months ago

PBENCH-1315

The production server, with "only" 108,728 indexed datasets (many more still haven't been migrated from the passthrough server), currently claims 84.1Gb of PostgreSQL storage just for the IndexMap table. Most of this consists of a list of each Opensearch document ID in order to allow using bulk update and delete operations to manage the index. This is straining the capacity of our RDU2 PostgreSQL server.

As an alternative, this PR removes the document list and instead of the bulk update and delete operations uses _delete_by_query and _update_by_query searching for documents in the appropriate indices (which we still store in the IndexMap) by parent dataset resource ID.

Along the way, I noticed that (oops) we were missing the "authorization" subdocument in some of our Elasticsearch documents, which would impact the authenticated search API behaviors. And I acted on a deprecation warning for a camelCase template keyword by replacing it with a snake_case alternative.

NOTE: In the interest of expediently deploying a fix for our SQL bloat in RDU2, this is missing unit testing for update and delete, both of which are tested (in indexed and non-indexed cases) by functional tests.