NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link

Pids occasionally being re-harvested #304

Closed gothub closed 2 years ago

gothub commented 2 years ago

Under some situations, pids are being reharvested:

from metadig-scheduler pod log:

20220113-18:58:45: [DEBUG]: quality-dataone-fair: submitting pid: doi:10.48502/hssh-5194 [edu.ucsb.nceas.mdqengine.scheduler.RequestReportJob:335]

20220113-18:59:40: [DEBUG]: quality-dataone-fair: submitting pid: doi:10.48502/hssh-5194 [edu.ucsb.nceas.mdqengine.scheduler.RequestReportJob:335]

This pid should only have been submitted once.

gothub commented 2 years ago

The 'last_harvest_date' maintained by metadig-engine for each node harvest, used to be set to the systemmetadata modified time. When the next harvest would happen, the last pid would be re-harvested unnecessarily.

This problem was fixed by setting the last_harvest_date (in the node_harvest table to one millisecond after the sysmeta modified time, so only newer pids will be picked up on the next harvest.