CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

ZK: job under 2 batch states after requeue #2075

Open terrywbrady opened 1 month ago

terrywbrady commented 1 month ago
Node State as of 2024-10-17 14:26:14 -0700:
/batches/bid0000000162: 
/batches/bid0000000162/states: 
/batches/bid0000000162/states/batch-completed: 
/batches/bid0000000162/states/batch-completed/jid0000004174: 
/batches/bid0000000162/states/batch-completed/jid0000004175: 
/batches/bid0000000162/states/batch-completed/jid0000004176: 
/batches/bid0000000162/states/batch-completed/jid0000004177: 
/batches/bid0000000162/states/batch-completed/jid0000004178: 
/batches/bid0000000162/states/batch-completed/jid0000004179: 
/batches/bid0000000162/states/batch-completed/jid0000004180: 
/batches/bid0000000162/states/batch-completed/jid0000004181: 
/batches/bid0000000162/states/batch-completed/jid0000004182: 
/batches/bid0000000162/states/batch-completed/jid0000004183: 
/batches/bid0000000162/states/batch-completed/jid0000004184: 
/batches/bid0000000162/states/batch-completed/jid0000004185: 
/batches/bid0000000162/states/batch-completed/jid0000004186: 
/batches/bid0000000162/states/batch-failed: 
/batches/bid0000000162/states/batch-failed/jid0000004176: 
/batches/bid0000000162/states/batch-processing: 
/batches/bid0000000162/status:
terrywbrady commented 2 weeks ago

Issue repeated on 10/30

/batches/bid0000002466/states/batch-completed/jid0000046151: 
/batches/bid0000002466/states/batch-completed/jid0000046152: 
/batches/bid0000002466/states/batch-failed: 
/batches/bid0000002466/states/batch-failed/jid0000046151: 
terrywbrady commented 6 days ago

I took a look at the ZK library, and I think it is doing the right things when re-queuing, so I do not think that it caused the issue above. I am starting to wonder if the admin tool needs to lock while re-queueing. I am deploying that change now. I also am setting a retry count. That was supposed to already be in place. https://github.com/CDLUC3/mrt-admin-lambda/commit/6b071a0aaac01c0b4b87258f3610b92e687e6edf