Terracotta-OSS / terracotta-platform

http://terracotta.org
Apache License 2.0
32 stars 48 forks source link

[5.9] Prevent further scale in/out process when a previous one failed, independently of the lock/unlock state of the config #1164

Closed mathieucarbou closed 1 year ago

mathieucarbou commented 1 year ago

Overview

Scale in flow

detach command (CLI)

  1. validation: fails if we find: a deny scale in marker
  2. lock
  3. trigger rebalancing
    • on failure:
      • try to place a marker deny scale in to prevent replaying the detach
      • try to unlock

on rebalancing success (server-side)

  1. detach the stripe
  2. unlock on nomad tx success

on rebalancing failure (server-side)

  1. place a deny scale in marker
  2. unlock

Scale out flow

attach command (CLI)

  1. validations
    • fails if we find: a deny scale out marker
  2. lock
  3. attach
    • on failure:
      • try to place a marker deny scale out to prevent replaying the attach
      • try to unlock
      • in any case, either a marker is placed or the config is kept locked
  4. trigger rebalancing on nomad success
    • on failure:
      • config is kept locked

on rebalancing success (server-side)

  1. unlock

on rebalancing failure (server-side)

  1. TRY detach
  2. FINALLY place a deny scale out marker
  3. FINALLY unlock

Why adding a marker if the attach or detach CLI fails ?

Because attach and detach are triggering 2 Nomad tx to lock and unlock (discovery/prepare/commit) and replaying would cause 2 problems:

Repairing

We can re-allow a scale op to be retried by running:

config-tool repair -force allow_scaling

mathieucarbou commented 1 year ago

@mobasherul @chrisdennis @jhouserizer : FYI, I've updated the description above to show the exact flow and error handling and how the markers work.

mathieucarbou commented 1 year ago

@mobasherul @chrisdennis : ready for review.