CDLUC3 / ezid

CDLUC3 ezid
MIT License
11 stars 4 forks source link

Identify EZID long running SQL queries #665

Closed jsjiang closed 1 month ago

jsjiang commented 3 months ago

Review EZID queries and make adjustments to improve performance.

A few long running queries:

Refactor the 2nd query to improve performace.

            qs = (
                ezidapp.models.identifier.Identifier.objects.filter(
                    Q(identifier__startswith=django.conf.settings.SHOULDERS_ARK_TEST)
                    | Q(identifier__startswith=django.conf.settings.SHOULDERS_DOI_TEST)
                    | Q(identifier__startswith=django.conf.settings.SHOULDERS_CROSSREF_TEST)
                )
                    .filter(createTime__lte=max_age_ts)

Top 10 longest running queries:

mysql> SELECT      SCHEMA_NAME,     
DIGEST_TEXT,     
COUNT_STAR AS exec_count,     
SUM_TIMER_WAIT / 1000000000000 AS total_exec_time_sec,     
AVG_TIMER_WAIT / 1000000000000 AS avg_exec_time_sec,     
MIN_TIMER_WAIT / 1000000000000 AS min_exec_time_sec,     
MAX_TIMER_WAIT / 1000000000000 AS max_exec_time_sec 
FROM      performance_schema.events_statements_summary_by_digest 
ORDER BY      SUM_TIMER_WAIT DESC LIMIT 10\G;

*************************** 1. row ***************************
        SCHEMA_NAME: ezid
        DIGEST_TEXT: INSERT INTO `kombu_message` ( VISIBLE , TIMESTAMP , `payload` , `version` , `queue_id` ) VALUES (...)
         exec_count: 122894
total_exec_time_sec: 314934.7635
  avg_exec_time_sec: 2.5627
  min_exec_time_sec: 0.0001
  max_exec_time_sec: 106.8853
*************************** 2. row ***************************
        SCHEMA_NAME: ezid
        DIGEST_TEXT: SELECT `ezidapp_identifier` . `id` , `ezidapp_identifier` . `identifier` FROM `ezidapp_identifier` WHERE ( ( `ezidapp_identifier` . `identifier` LIKE BINARY ? OR `ezidapp_identifier` . `identifier` LIKE BINARY ? OR `ezidapp_identifier` . `identifier` LIKE BINARY ? ) AND `ezidapp_identifier` . `createTime` <= ? ) LIMIT ?
         exec_count: 327
total_exec_time_sec: 278087.0967
  avg_exec_time_sec: 850.4193
  min_exec_time_sec: 765.2373
  max_exec_time_sec: 1145.5108
*************************** 3. row ***************************
        SCHEMA_NAME: ezid
        DIGEST_TEXT: SELECT `ezidapp_refidentifier` . `id` , `ezidapp_refidentifier` . `datacenter_id` , `ezidapp_refidentifier` . `profile_id` , `ezidapp_refidentifier` . `owner_id` , `ezidapp_refidentifier` . `ownergroup_id` , `ezidapp_refidentifier` . `createTime` , `ezidapp_refidentifier` . `updateTime` , `ezidapp_refidentifier` . `status` , `ezidapp_refidentifier` . `unavailableReason` , `ezidapp_refidentifier` . `exported` , `ezidapp_refidentifier` . `crossrefStatus` , `ezidapp_refidentifier` . `crossrefMessage` , `ezidapp_refidentifier` . `target` , `ezidapp_refidentifier` . `cm` , `ezidapp_refidentifier` . `metadata` , `ezidapp_refidentifier` . `agentRole` , `ezidapp_refidentifier` . `isTest` , `ezidapp_refidentifier` . `identifier` FROM `ezidapp_refidentifier` WHERE ( `ezidapp_refidentifier` . `updateTime` >= ? AND `ezidapp_refidentifier` . `updateTime` <= ? ) ORDER BY `ezidapp_refidentifier` . `id` DESC LIMIT ?
         exec_count: 512591
total_exec_time_sec: 228474.8468
  avg_exec_time_sec: 0.4457
  min_exec_time_sec: 0.0004
  max_exec_time_sec: 5.0664
*************************** 4. row ***************************
        SCHEMA_NAME: ezid
        DIGEST_TEXT: SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `owner_id` , `ezidapp_searchidentifier` . `createTime` , `ezidapp_searchidentifier` . `isTest` , `ezidapp_searchidentifier` . `hasMetadata` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?
         exec_count: 37867
total_exec_time_sec: 129293.8705
  avg_exec_time_sec: 3.4144
  min_exec_time_sec: 0.0002
  max_exec_time_sec: 14.8088
*************************** 5. row ***************************
        SCHEMA_NAME: ezid
        DIGEST_TEXT: SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `linkIsBroken` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?
         exec_count: 3806
total_exec_time_sec: 124934.7843
  avg_exec_time_sec: 32.8257
  min_exec_time_sec: 0.0002
  max_exec_time_sec: 83.9031
*************************** 6. row ***************************
        SCHEMA_NAME: ezid
        DIGEST_TEXT: SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `owner_id` , `ezidapp_searchidentifier` . `status` , `ezidapp_searchidentifier` . `target` , `ezidapp_searchidentifier` . `isTest` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?
         exec_count: 124198
total_exec_time_sec: 45530.4058
  avg_exec_time_sec: 0.3666
  min_exec_time_sec: 0.0001
  max_exec_time_sec: 2.5149
*************************** 7. row ***************************
        SCHEMA_NAME: ezid
        DIGEST_TEXT: UPDATE `ezidapp_linkchecker` SET `identifier` = ? , `owner_id` = ? , `target` = ? , `lastCheckTime` = ? , `numFailures` = ? , `isBad` = ? , `returnCode` = ? , `error` = ? , `mimeType` = ? , `size` = ? , `hash` = ? WHERE `ezidapp_linkchecker` . `id` = ?
         exec_count: 2198302
total_exec_time_sec: 18928.8716
  avg_exec_time_sec: 0.0086
  min_exec_time_sec: 0.0038
  max_exec_time_sec: 1.0698
*************************** 8. row ***************************
        SCHEMA_NAME: ezid
        DIGEST_TEXT: SELECT `kombu_message` . `id` AS `kombu_message_id` , `kombu_message` . `visible` AS `kombu_message_visible` , `kombu_message` . `timestamp` AS `kombu_message_timestamp` , `kombu_message` . `payload` AS `kombu_message_payload` , `kombu_message` . `version` AS `kombu_message_version` , `kombu_message` . `queue_id` AS `kombu_message_queue_id` FROM `kombu_message` WHERE `kombu_message` . `queue_id` = ? AND `kombu_message` . `visible` != FALSE ORDER BY `kombu_message` . `timestamp` , `kombu_message` . `id` LIMIT ? FOR UPDATE
         exec_count: 613322
total_exec_time_sec: 18559.6089
  avg_exec_time_sec: 0.0303
  min_exec_time_sec: 0.0001
  max_exec_time_sec: 85.2425
*************************** 9. row ***************************
        SCHEMA_NAME: ezid
        DIGEST_TEXT: COMMIT
         exec_count: 1246235
total_exec_time_sec: 10535.3777
  avg_exec_time_sec: 0.0085
  min_exec_time_sec: 0.0000
  max_exec_time_sec: 49.7269
*************************** 10. row ***************************
        SCHEMA_NAME: ezid
        DIGEST_TEXT: SELECT `ezidapp_binderqueue` . `seq` , `ezidapp_binderqueue` . `refIdentifier_id` , `ezidapp_binderqueue` . `enqueueTime` , `ezidapp_binderqueue` . `submitTime` , `ezidapp_binderqueue` . `operation` , `ezidapp_binderqueue` . `status` , `ezidapp_binderqueue` . `message` , `ezidapp_binderqueue` . `batchId` , `ezidapp_binderqueue` . `error` , `ezidapp_binderqueue` . `errorIsPermanent` FROM `ezidapp_binderqueue` WHERE `ezidapp_binderqueue` . `refIdentifier_id` = ?
         exec_count: 51259034
total_exec_time_sec: 6208.5945
  avg_exec_time_sec: 0.0001
  min_exec_time_sec: 0.0001
  max_exec_time_sec: 0.2982
jsjiang commented 2 months ago

Performance Insights review on July 10 identified queries with long waits between 9-11am. The top one is a slect query on the searchIdentifier table. Top queries and screenshots are documented on Google drive in EZID/Identifiers > Dev-Sync > 20240710_RDS_Performance_Insights

The top query with long waits:

SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `owner_id` , `ezidapp_searchidentifier` . `status` , `ezidapp_searchidentifier` . `target` , `ezidapp_searchidentifier` . `isTest` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?

This is actually the 6th long running queries in the "Top 10 longest running queries" list. Action:

jsjiang commented 2 months ago

The link checker job by proc-link-chekcer.py might have contributed to this long waits query:

SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `owner_id` , `ezidapp_searchidentifier` . `status` , `ezidapp_searchidentifier` . `target` , `ezidapp_searchidentifier` . `isTest` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?

proc-link-chekcer.py:

        search_identifier_model = django.apps.apps.get_model('ezidapp', 'SearchIdentifier')
        siGenerator = self.harvest(
            search_identifier_model,
            ["identifier", "owner", "status", "target", "isTest"],
            lambda si: si.isPublic
            and not si.isTest
            and si.target != si.defaultTarget
            and si.owner_id not in self._permanentExcludes,
        )
jsjiang commented 2 months ago

Reviewed the Performance Insights again on July 25, targeting queries run between 1:30-3:30 am (8:30-10:30 UTC) and identified the following queries with long waits and latency:

Load by waits (AAS) SQL query Calls/sec Avg Latency (ms)/call Rows examined/call
0.91 query 1. 0.27 3309.32. 9,996.30
0.87 query 2. 0.03 31816.72 99,587.50
0.30 query 3 0.46. 661.18 524,505.14
0.03 query 4 0.02. 1504.37 99,297.36
query 1:
SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `owner_id` , `ezidapp_searchidentifier` . `createTime` , `ezidapp_searchidentifier` . `isTest` , `ezidapp_searchidentifier` . `hasMetadata` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?

query 2:
SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `linkIsBroken` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?

query 3:
SELECT `ezidapp_refidentifier` . `id` , `ezidapp_refidentifier` . `datacenter_id` , `ezidapp_refidentifier` . `profile_id` , `ezidapp_refidentifier` . `owner_id` , `ezidapp_refidentifier` . `ownergroup_id` , `ezidapp_refidentifier` . `createTime` , `ezidapp_refidentifier` . `updateTime` , `ezidapp_refidentifier` . `status` , `ezidapp_refidentifier` . `unavailableReason` , `ezidapp_refidentifier` . `exported` , `ezidapp_refidentifier` . `crossrefStatus` , `ezidapp_refidentifier` . `crossrefMessag

query 4:
SELECT ezidapp_linkchecker.id, ezidapp_linkerchecker_identifier, .....
jsjiang commented 2 months ago

Also looked into a few long run queries performed around July 25 21:15 UTC (2:15pm PDT). These are most likely search related queries.

Load by waits (AAS) SQL query Calls/sec Avg Latency (ms)/call
0.30. query 1. 0.46. 688.55
0.16. query 2 0.01. 42,206.09
Query 1:
SELECT `ezidapp_refidentifier` . `id` , `ezidapp_refidentifier` . `datacenter_id` , `ezidapp_refidentifier` . `profile_id` , `ezidapp_refidentifier` . `owner_id` , `ezidapp_refidentifier` . `ownergroup_id` , `ezidapp_refidentifier` . `createTime` , `ezidapp_refidentifier` . `updateTime` , `ezidapp_refidentifier` . `status` , `ezidapp_refidentifier` . `unavailableReason` , `ezidapp_refidentifier` . `exported` , `ezidapp_refidentifier` . `crossrefStatus` , `ezidapp_refidentifier` . `crossrefMessag

query 2:
SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND MATCH ( `ezidapp_searchidentifier` . `keywords` ) AGAINST ( ? IN BOOLEAN MODE ) )
jsjiang commented 2 months ago

A sample long run query that is sorted by the identifier:

SELECT 'ezidapp_searchidentifier 'id', 'ezidapp_searchidentifier" ezidapp_searchidentifier'
'isTest'
'ezidapp_searchidentifier'
'identifier'
'ezidapp_searchidentifier'. 'owner_id'
'ezidapp_searchidentifier'. 'createTime'
' hasMetadata'
ORDER BY
ezidapp searchidentifier'" 'identifier'
FROM 'ezidapp_searchidentifier' WHERE 'ezidapp_searchidentifier' 'identifier' > 'ark:/88122/qzff0156'
ASC LIMIT 10000

EZID_long_run_query

jsjiang commented 2 months ago

Opened a ticket with IAS to investigate low memory issue. IAS opened a ticket with AWS. AWS noticed high IOPS on the EC2 instance and recommended to upgrade instance type.

Here is the IAS ticket: Low memory on EZID PRD RDS

jsjiang commented 1 month ago

Queries used by the proc-link-checker-update job (once a day):

SELECT `ezidapp_searchidentifier`.`id`, `ezidapp_searchidentifier`.`identifier`, `ezidapp_searchidentifier`.`owner_id`, `ezidapp_searchidentifier`.`createTime`, `ezidapp_searchidentifier`.`isTest`, `ezidapp_searchidentifier`.`hasMetadata` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier`.`identifier` > 'ark:/88122/pqww0100' ORDER BY `ezidapp_searchidentifier`.`identifier` ASC LIMIT 10000
SELECT `ezidapp_searchidentifier`.`id`, `ezidapp_searchidentifier`.`identifier`, `ezidapp_searchidentifier`.`linkIsBroken` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier`.`identifier` > 'ark:/88122/zxwg0138' ORDER BY `ezidapp_searchidentifier`.`identifier` ASC LIMIT 100000

Query used by the proc-expunge job (once a day)

SELECT `ezidapp_identifier`.`id`, `ezidapp_identifier`.`identifier` FROM `ezidapp_identifier` WHERE ((`ezidapp_identifier`.`identifier` LIKE BINARY 'ark:/99999/fk4%' OR `ezidapp_identifier`.`identifier` LIKE BINARY 'doi:10.5072/FK2%' OR `ezidapp_identifier`.`identifier` LIKE BINARY 'doi:10.15697/%') AND `ezidapp_identifier`.`createTime` <= 1722806703) LIMIT 100

Query used by the proc-link-checker job:

SELECT `ezidapp_searchidentifier`.`id`, `ezidapp_searchidentifier`.`identifier`, `ezidapp_searchidentifier`.`owner_id`, `ezidapp_searchidentifier`.`status`, `ezidapp_searchidentifier`.`target`, `ezidapp_searchidentifier`.`isTest` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier`.`identifier` > 'ark:/88122/gnch0276' ORDER BY `ezidapp_searchidentifier`.`identifier` ASC LIMIT 1000

Image

jsjiang commented 1 month ago

Created tickets for refactoring related background jobs:

jsjiang commented 1 month ago

Created tickets for future actions. Closing this ticket.