NCEAS / metadig-engine

MetaDig Engine: multi-dialect metadata assessment engine
7 stars 5 forks source link

document sysmeta cannot be found, run status fails to update #361

Closed jeanetteclark closed 1 year ago

jeanetteclark commented 1 year ago

Recently a document on DataONE was submitted, updated with a status of "processing," successfully processed and index updated, but the entry status was never set to "sucess" in the database. The error I see is:

20230708-01:49:21: [ERROR]: Missing sysmeta model for run with id: https://pasta.lternet.edu/package/metadata/eml/knb-lter-jrn/210548085/34 [edu.ucsb.nceas.mdqengine.model.Run:115]

It then gets picked up by the monitor job and fails with an error that the node is not supported in the configuration file.

It seems like there are two problems here.

  1. figure out why there isn't any sysmeta for this object and where that orginal error is coming from
  2. fall back on the CN for getMetadata in MonitorJob if the origin member node of an object isn't in the config file
jeanetteclark commented 1 year ago

Okay finally got this sorted, some jobs were getting stuck in a processing status with nowhere else to go.

I'm not sure exactly how it happens, but these pids make it into the scheduler for a job that their node is not configured on (think an ESS dive production pid on test-arctic). The worker seems to fall on its face, and they are stuck in limbo with a status of processing in the runs table. The monitor job then picks them up, but runs into the same issue as the worker (presumably), so my solution is twofold:

  1. If an object can't be found on the MN, try to find it on the CN. This should work for replica objects (like LTER datasets on the ADC)
  2. If the object still can't be found on either MN or CN, don't keep trying to find it and update the runs table with a failure status and the error message.