When a harvest has been through a couple of run, the information accumulated in the harvest_items rows grows to an unsustainable amount.
For example:
multiple EMU invocations are kept (seen up to 15 of these recorded on many thousands of items). Each invocation is kept because multiple invocations can change the file. But many of the repeats are identical (EMU reports no changes) and keeping the history is pointless.
The error field appears to grow, especially in timeout like scenarios
The file_info field also is redundant once an item is harvested
We should address this:
clear error once a harvest item is successful
do not store redundant emu ouput
clear file_info once the harvest item is complete
We should also create a job that cleans up previous history. The production databases are growing unsustainably.
When a harvest has been through a couple of run, the information accumulated in the harvest_items rows grows to an unsustainable amount.
For example:
We should address this:
We should also create a job that cleans up previous history. The production databases are growing unsustainably.