WebCuratorTool / webcurator

The root of the webcurator tool project, containing all modules needed to run a fully functional webcurator tool.
Apache License 2.0
2 stars 1 forks source link

Fix for slow deletion of harvest resources in MySQL/MariaDB #20

Closed hannakoppelaar closed 3 years ago

hannakoppelaar commented 3 years ago

In MySQL the Hibernate-generated statements for deleting records from harvest_resource and arc_harvest_resource lead to long running queries which might time-out in the case of large harvests. The fix is to use a simple ANSI-SQL native query and to add a cascaded delete to the schema. A fix without any alteration to the schema seems to be impossible, if the code is to run on all supported database platforms.

obrienben commented 3 years ago

@hannakoppelaar are the code and db changes coupled together? Can I deploy the code without the db schema update or will that cause problems?

hannakoppelaar commented 3 years ago

@hannakoppelaar are the code and db changes coupled together? Can I deploy the code without the db schema update or will that cause problems?

@obrienben I don't think it will work because ARC_HARVEST_RESOURCE will no longer be updated when HARVEST_RESOURCE is updated, which will cause the database to complain about referential integrity.

leefrank9527 commented 3 years ago

Tested with Mysql8. It cost 115 seconds to delete 1M Harvest Resources rows. It's fine from my opnion. I'll test it with PostgreSQL and Oracle.

leefrank9527 commented 3 years ago

Tested with PostgreSQL13. It cost 113 seconds to delete 1M Harvest Resources rows. It's fine from my opnion.