Closed jsjiang closed 1 month ago
Performance Insights review on July 10 identified queries with long waits between 9-11am. The top one is a slect query on the searchIdentifier table. Top queries and screenshots are documented on Google drive in EZID/Identifiers > Dev-Sync > 20240710_RDS_Performance_Insights
The top query with long waits:
SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `owner_id` , `ezidapp_searchidentifier` . `status` , `ezidapp_searchidentifier` . `target` , `ezidapp_searchidentifier` . `isTest` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?
This is actually the 6th long running queries in the "Top 10 longest running queries" list. Action:
The link checker job by proc-link-chekcer.py
might have contributed to this long waits query:
SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `owner_id` , `ezidapp_searchidentifier` . `status` , `ezidapp_searchidentifier` . `target` , `ezidapp_searchidentifier` . `isTest` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?
search_identifier_model = django.apps.apps.get_model('ezidapp', 'SearchIdentifier')
siGenerator = self.harvest(
search_identifier_model,
["identifier", "owner", "status", "target", "isTest"],
lambda si: si.isPublic
and not si.isTest
and si.target != si.defaultTarget
and si.owner_id not in self._permanentExcludes,
)
Reviewed the Performance Insights again on July 25, targeting queries run between 1:30-3:30 am (8:30-10:30 UTC) and identified the following queries with long waits and latency:
Load by waits (AAS) | SQL query | Calls/sec | Avg Latency (ms)/call | Rows examined/call |
---|---|---|---|---|
0.91 | query 1. | 0.27 | 3309.32. | 9,996.30 |
0.87 | query 2. | 0.03 | 31816.72 | 99,587.50 |
0.30 | query 3 | 0.46. | 661.18 | 524,505.14 |
0.03 | query 4 | 0.02. | 1504.37 | 99,297.36 |
query 1:
SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `owner_id` , `ezidapp_searchidentifier` . `createTime` , `ezidapp_searchidentifier` . `isTest` , `ezidapp_searchidentifier` . `hasMetadata` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?
query 2:
SELECT `ezidapp_searchidentifier` . `id` , `ezidapp_searchidentifier` . `identifier` , `ezidapp_searchidentifier` . `linkIsBroken` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier` . `identifier` > ? ORDER BY `ezidapp_searchidentifier` . `identifier` ASC LIMIT ?
query 3:
SELECT `ezidapp_refidentifier` . `id` , `ezidapp_refidentifier` . `datacenter_id` , `ezidapp_refidentifier` . `profile_id` , `ezidapp_refidentifier` . `owner_id` , `ezidapp_refidentifier` . `ownergroup_id` , `ezidapp_refidentifier` . `createTime` , `ezidapp_refidentifier` . `updateTime` , `ezidapp_refidentifier` . `status` , `ezidapp_refidentifier` . `unavailableReason` , `ezidapp_refidentifier` . `exported` , `ezidapp_refidentifier` . `crossrefStatus` , `ezidapp_refidentifier` . `crossrefMessag
query 4:
SELECT ezidapp_linkchecker.id, ezidapp_linkerchecker_identifier, .....
Also looked into a few long run queries performed around July 25 21:15 UTC (2:15pm PDT). These are most likely search related queries.
Load by waits (AAS) | SQL query | Calls/sec | Avg Latency (ms)/call |
---|---|---|---|
0.30. | query 1. | 0.46. | 688.55 |
0.16. | query 2 | 0.01. | 42,206.09 |
Query 1:
SELECT `ezidapp_refidentifier` . `id` , `ezidapp_refidentifier` . `datacenter_id` , `ezidapp_refidentifier` . `profile_id` , `ezidapp_refidentifier` . `owner_id` , `ezidapp_refidentifier` . `ownergroup_id` , `ezidapp_refidentifier` . `createTime` , `ezidapp_refidentifier` . `updateTime` , `ezidapp_refidentifier` . `status` , `ezidapp_refidentifier` . `unavailableReason` , `ezidapp_refidentifier` . `exported` , `ezidapp_refidentifier` . `crossrefStatus` , `ezidapp_refidentifier` . `crossrefMessag
query 2:
SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND MATCH ( `ezidapp_searchidentifier` . `keywords` ) AGAINST ( ? IN BOOLEAN MODE ) )
A sample long run query that is sorted by the identifier:
SELECT 'ezidapp_searchidentifier 'id', 'ezidapp_searchidentifier" ezidapp_searchidentifier'
'isTest'
'ezidapp_searchidentifier'
'identifier'
'ezidapp_searchidentifier'. 'owner_id'
'ezidapp_searchidentifier'. 'createTime'
' hasMetadata'
ORDER BY
ezidapp searchidentifier'" 'identifier'
FROM 'ezidapp_searchidentifier' WHERE 'ezidapp_searchidentifier' 'identifier' > 'ark:/88122/qzff0156'
ASC LIMIT 10000
Opened a ticket with IAS to investigate low memory issue. IAS opened a ticket with AWS. AWS noticed high IOPS on the EC2 instance and recommended to upgrade instance type.
Here is the IAS ticket: Low memory on EZID PRD RDS
Queries used by the proc-link-checker-update
job (once a day):
SELECT `ezidapp_searchidentifier`.`id`, `ezidapp_searchidentifier`.`identifier`, `ezidapp_searchidentifier`.`owner_id`, `ezidapp_searchidentifier`.`createTime`, `ezidapp_searchidentifier`.`isTest`, `ezidapp_searchidentifier`.`hasMetadata` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier`.`identifier` > 'ark:/88122/pqww0100' ORDER BY `ezidapp_searchidentifier`.`identifier` ASC LIMIT 10000
SELECT `ezidapp_searchidentifier`.`id`, `ezidapp_searchidentifier`.`identifier`, `ezidapp_searchidentifier`.`linkIsBroken` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier`.`identifier` > 'ark:/88122/zxwg0138' ORDER BY `ezidapp_searchidentifier`.`identifier` ASC LIMIT 100000
Query used by the proc-expunge
job (once a day)
SELECT `ezidapp_identifier`.`id`, `ezidapp_identifier`.`identifier` FROM `ezidapp_identifier` WHERE ((`ezidapp_identifier`.`identifier` LIKE BINARY 'ark:/99999/fk4%' OR `ezidapp_identifier`.`identifier` LIKE BINARY 'doi:10.5072/FK2%' OR `ezidapp_identifier`.`identifier` LIKE BINARY 'doi:10.15697/%') AND `ezidapp_identifier`.`createTime` <= 1722806703) LIMIT 100
Query used by the proc-link-checker
job:
SELECT `ezidapp_searchidentifier`.`id`, `ezidapp_searchidentifier`.`identifier`, `ezidapp_searchidentifier`.`owner_id`, `ezidapp_searchidentifier`.`status`, `ezidapp_searchidentifier`.`target`, `ezidapp_searchidentifier`.`isTest` FROM `ezidapp_searchidentifier` WHERE `ezidapp_searchidentifier`.`identifier` > 'ark:/88122/gnch0276' ORDER BY `ezidapp_searchidentifier`.`identifier` ASC LIMIT 1000
Created tickets for refactoring related background jobs:
Created tickets for future actions. Closing this ticket.
Review EZID queries and make adjustments to improve performance.
A few long running queries:
kombu_message
- total counts and total run time are high; we have decided to disable the Matomo API tacking which issues the INSERT INTOkombu_message
transactionsezidapp_identifier
statement with 3identifier
LIKE BINARY ? in the WHERE clause - the total counts is small but average run time is very long. The original query in EZID is most likely the following one in theproc-expunge.py
scriptRefactor the 2nd query to improve performace.
Top 10 longest running queries: