D10S0VSkY-OSS / Stack-Lifecycle-Deployment

OpenSource self-service infrastructure solution that defines and manages the complete lifecycle of resources used and provisioned into a cloud! It is a terraform UI with rest api for terraform and OpenTofu automation
MIT License
227 stars 34 forks source link

When the worker-squad crush from some reason(Evicted, OOMKiller..) the deploy state is stuck and returns the error “Deploy state running, cannot upgrade #255

Closed D10S0VSkY-OSS closed 5 months ago

D10S0VSkY-OSS commented 5 months ago

SLD has two systems to mitigate race conditions: one before sending tasks and another inherent to the workers, which uses Redis to lock it for 5 minutes as TTL, and it gets deleted after this time. The bug existed in the first step, as it only sends a task that is running in the states ["SUCCESS", "FAILURE", "REVOKED", "PENDING"]. A button has been created to force the unlock, which leaves the state in PENDING (UNKNOWN), allowing a task to be run again.

image