Open jsjiang opened 1 month ago
Related tickets: #394 https://github.com/CDLUC3/ezid/issues/371
Queue data cleanup
Queues:
Associated datasets:
Notes:
updatetime
RefIndetifier
table may trigger error if the identifier was updated outside of the selected date range - this is expected as this identifier is still associated to one or more later transactions.Procedures:
proc-cleanup-async-queues.py
script or develop a new one that can select identifiers by a specified data range (see related ticket #727 )Exception occured while processing identifier 'ark:/99999/fk4km0qb78' for 'refId' table
ERROR ezidapp.management.commands.proc-cleanup-async-queues_v2 7961173056: ("Cannot delete some instances of model 'RefIdentifier' because they are referenced through protected foreign keys: 'BinderQueue.refIdentifier'.", {<BinderQueue: BinderQueue(id=1, refId=ark:/99999/fk4km0qb78, op=CREATE, status="Awaiting submission", owner="admin (admin)", group="f51340b091 ()")>})
Implementation procedure:
python manage.py diag-queus-stats
command to get queue statsezid-ops-scripts/scripts/dump-queue-tables_ops.sh
script to backup queue tables (use the eziddba account)ezid-ops-scripts/scripts/723_delete_binder_queue.sql
query to delete all records from the binder queue - requires the ezidrw account and passwordezid-ops-scripts/scripts/723_update_queue_status.sql
query to update the datacite and searchindexer records status from "S" to "O" for records that were created before UNIX_TIMESTAMP('2023-06-29 05:10:00')=1688040600python manage.py diag-queus-stats
command to get queue statsTest on ezid-stg (script v0.0.8rc1
):
--set-gtid-purged
option/ezid/tmp/ezid_sql_dump_files/stg-20241021
Batch update:
mysql -h rds-uc3-ezid1-stg.cmcguhglinoa.us-west-2.rds.amazonaws.com -u ezidrw ezid -p < 723_update_queue_status.sql
Batch delete:
ezid@uc3-ezidui-stg01:15:52:54:~/ezid-ops-scripts/scripts/sql_scripts$ mysql -h rds-uc3-ezid1-stg.cmcguhglinoa.us-west-2.rds.amazonaws.com -u ezidrw ezid -p < 723_delete_binder_queue.sql
diag-queus-stats before batch update:
{
"download": {},
"binder": {
"F": 1,
"O": 44111,
"S": 137359,
"U": 530
},
"datacite": {
"F": 19769,
"O": 47132,
"S": 128571
},
"crossref": {
"F": 568,
"I": 181521,
"O": 2439,
"W": 6
},
"searchindexer": {
"F": 103,
"O": 47401,
"S": 752336
}
}
diag-queus-stats after batch update:
{
"download": {},
"binder": {
"F": 1,
"O": 44111,
"S": 137359,
"U": 530
},
"datacite": {
"F": 19769,
"O": 175703
},
"crossref": {
"F": 568,
"I": 181521,
"O": 2439,
"W": 6
},
"searchindexer": {
"F": 103,
"O": 799737
}
}
diag-queus-stats after batch delete binder records:
{
"download": {},
"binder": {},
"datacite": {
"F": 19769,
"O": 175722
},
"crossref": {
"F": 572,
"I": 181536,
"O": 2441,
"W": 6
},
"searchindexer": {
"F": 103,
"O": 799756
}
}
Dump files:
-rw-r--r--. 1 ezid ezid 10626904 Oct 21 14:18 ezidapp_binderqueue_table_dump_20241021_141823.sql
-rw-r--r--. 1 ezid ezid 9851380 Oct 21 14:18 ezidapp_crossrefqueue_table_dump_20241021_141823.sql
-rw-r--r--. 1 ezid ezid 11784727 Oct 21 14:18 ezidapp_datacitequeue_table_dump_20241021_141823.sql
-rw-r--r--. 1 ezid ezid 47093377 Oct 21 14:18 ezidapp_searchindexerqueue_table_dump_20241021_141823.sql
Production implementation:
Binder queue
Crossref queue:
DataCite queue:
SearchIndexer queue:
RefIdentifier table:
Queue stats before:
ezid@uc3-ezidui-prd01:05:08:11:~/ezid$ python manage.py diag-queue-stats
{
"download": {},
"binder": {
"F": 6,
"O": 4015090,
"S": 7993776,
"U": 669913
},
"datacite": {
"F": 1565,
"O": 5367787,
"S": 7992232,
"U": 28
},
"crossref": {
"F": 3878,
"I": 13330253,
"O": 16969,
"U": 28,
"W": 99
},
"searchindexer": {
"F": 54,
"O": 5466911,
"S": 9337175
}
}
Dump files:
ezid@uc3-ezidui-prd01:05:17:53:~/tmp/ezid_sql_dump_files$ ls -l
total 3107532
-rw-r--r--. 1 ezid ezid 759329912 Oct 24 05:15 ezidapp_binderqueue_table_dump_20241024_051443.sql
-rw-r--r--. 1 ezid ezid 725560660 Oct 24 05:15 ezidapp_crossrefqueue_table_dump_20241024_051443.sql
-rw-r--r--. 1 ezid ezid 805112285 Oct 24 05:16 ezidapp_datacitequeue_table_dump_20241024_051443.sql
-rw-r--r--. 1 ezid ezid 892086020 Oct 24 05:16 ezidapp_searchindexerqueue_table_dump_20241024_051443.sql
Delete binder queue records:
mysql -h rds-uc3-ezid5-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com -u ezidrw ezid -p < 723_delete_binder_queue.sql
Update queue status:
mysql -h rds-uc3-ezid5-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com -u ezidrw ezid -p < 723_update_queue_status.sql
Queue stats after:
{
"download": {},
"binder": {},
"datacite": {
"F": 1565,
"O": 13350075
},
"crossref": {
"F": 3878,
"I": 13320253,
"O": 16903,
"W": 99
},
"searchindexer": {
"F": 54,
"O": 14793602
}
}
Binder queue table stats after batch delete:
Binder queue table stats after running the TRUNCATE table ezidapp_binderqueue;
command:
Note: It was probably better to use "TRUNCATE table" instead of "delete from table" initially. Delete took about 50 minutes while truncate only took a few seconds. Also truncate released disk space immediately.
Lessons learned:
sudo cdlsysctl stop service
command might have triggered restarting the services by the demon agent. Check with Ashley the relationships among the Nagios settings, the deployment configuration and the cdlsysctl command.Questions:
Deployed proc-cleanup-async-queues_v2.py
on ezid-prd on Oct 30 with release v3.2.27
We run the
proc-cleanup-async-queues.py
script to delete successfully processed or not applicable identifiers from the binder, crossref, datacite and search queues and from the RefIdentifier table. The script filters records by the queue status (SUCCESS or IGNORED) and the update timestamps (updated in the past week). We started to run theproc-cleanup-async-queues.py
script as a background job some time in 2023 (mostly likely June 29, 2023). There are some data issues with these queues:There might be a status definition change with the June 29 2023 deployment:
The
proc-cleanup-async-queues
might have missed records with SUCCESS and IGNORED status in its processing window (now - 2 weeks)proc-cleanup-async-queues
job. These records should have been deleted when they had gone into the "past week" window.We stopped the
proc-binder
queue on Aug 12, 2024. However we have not stopped putting records to theezidapp_binderqueue
table. There are over 560K records with UNSUBMITTED = "U" status in the binder queue since then (seq > 14941881). All binder queue records (over 12M) can be deleted.The changing point for the datacite queue is seq=7993833:
Records that had passed the "past week" window but had not been deleted by
proc-cleanup-async-queues
:QuerySet with time range and batch size filters that contributed to neglected records
EZID-PRD Queue status on 2024-10-14 reported by running the
diag-queue-stats
command: