dlr-eoc / prosEO

prosEO – A Processing System for Earth Observation Data
GNU General Public License v3.0
14 stars 1 forks source link

Production Planner: Order and job stay "SUSPENDING"/"ON_HOLD" after forced suspend of last job step #149

Closed tangobravo62 closed 2 years ago

tangobravo62 commented 3 years ago

The following sequence of actions was executed:

  1. Release a processing order --> job step is started, but never finishes due to problems on Kubernetes side
  2. Processing order is suspended --> order is "SUSPENDING", job is "ON_HOLD", job step is still "RUNNING"
  3. After correction of the problem in Kubernetes, the hanging job step is forcibly suspended --> job step now "INITIAL": Bildschirmfoto 2021-05-27 um 09 33 02

It was expected that the job now reverts to status "INITIAL", since no more job steps are running, and the order hence changes to "PLANNED", since no more jobs are "ON_HOLD", but this is not the case:

Bildschirmfoto 2021-05-27 um 09 32 37

Workaround: No workaround in CLI or GUI is possible, since orders in status "SUSPENDING" and jobs in status "ON_HOLD" cannot be updated. It was necessary to change the status values manually in the database tables.

tangobravo62 commented 2 years ago

Implemented with a bottom-up propagation from job step state to job state to order state (commit f436094). Tested during S5P DDS6 reprocessing campaign.