Currently, if the process{resourcetype}startid plugin config value for tool_metadata is not found, we reset processing start id to 0 (zero), this means in the case of the file resource, we restart the processing of metadata for files from the beginning of the {files} table.
This can cause a lot of additional time in reprocessing files that have already been processed and finally getting to the end of the file table where the latest files are.
Recommend we add a check of the {tool_metadata_extractions} table to the classes/task/process_extractions_base_task.php get_extractions_to_process method, looking for the highest resourceid for the type we are processing, and if one is found, we use that as the default start id, to prevent reprocessing of a large amount of resources.
Currently, if the process{resourcetype}startid plugin config value for tool_metadata is not found, we reset processing start id to 0 (zero), this means in the case of the file resource, we restart the processing of metadata for files from the beginning of the
{files}
table.This can cause a lot of additional time in reprocessing files that have already been processed and finally getting to the end of the file table where the latest files are.
Recommend we add a check of the
{tool_metadata_extractions}
table to the classes/task/process_extractions_base_task.phpget_extractions_to_process
method, looking for the highestresourceid
for thetype
we are processing, and if one is found, we use that as the default start id, to prevent reprocessing of a large amount of resources.