Open swirtSJW opened 2 months ago
Is there any issue with just incrementing the timestamp by 1 second (hoping it would make it then be unique)? Or is that timestamp used as a unique ID other places in the system?
It turns out that bumping the timestamp will likely disconnect the harvest run from all other things. This also means it could not be addressed with a try catch.
The new hope is that some variation of this might work:
The HarvestRunRepository()::loadEntity() function treats the id
AND the harvest_plan_id
as the combined key to look up the harvest run entity.
So if the id were not the key for the table, then there would be no issue with the id needing to be unique. The only risk would be if two harvest runs from the same plan took place at the same time.
I think the easiest solution to implement would be to add another column which is the actual ID (maybe a uuid), and use that as the unique key. Leave everything else in place, and provide an update path to the new entity schema for both old style harvest_id_run tables and the newer entity tables.
Current Behavior
When updating to DKAN 2.19 from a lower version, if two different harvest_ID_runs tables happen to have the same timestamp a sql error is thrown because the timestamp is treated as the unique identifier. This is unlikely since it is only a span of 1 second, but it is possible to encounter in the wild.
Expected Behavior
The migration of data from one table to another should happen without error.
Steps To Reproduce
drush upddb
ordrush dkan:harvest:update
Relevant log output (optional)
No response
Anything else?
This may be too unlikely a scenerio to add a try catch block
HarvestUtility::convertRunTable()
but I will at least provide a drush sqlc command to undo any duplicated IDs.Discussion in CA Slack