Introduced some validations to prevent scale in/out operation to be performed if one failed previously
Introduced a repair mode for the user to allow the operation to proceed with config-tool repair -force allow_scaling
Refactored the lock/unlock logic in attach and detach to correctly catch exceptions and add markers to deny any further scale in/out in case of errors
Introduced some flags in the diagnostic output to show if scale in/out process is allowed or not
Refactored the inheritance logic for attach and detach between OSS and EE to consolidate the lock/unlock/marker logic at one place and only let EE override the difference in behaviour
Scale in flow
detach command (CLI)
validation: fails if we find: a deny scale in marker
lock
trigger rebalancing
on failure:
try to place a marker deny scale in to prevent replaying the detach
try to unlock
on rebalancing success (server-side)
detach the stripe
unlock on nomad tx success
on rebalancing failure (server-side)
place a deny scale in marker
unlock
Scale out flow
attach command (CLI)
validations
fails if we find: a deny scale out marker
lock
attach
on failure:
try to place a marker deny scale out to prevent replaying the attach
try to unlock
in any case, either a marker is placed or the config is kept locked
trigger rebalancing on nomad success
on failure:
config is kept locked
on rebalancing success (server-side)
unlock
on rebalancing failure (server-side)
TRY detach
FINALLY place a deny scale out marker
FINALLY unlock
Why adding a marker if the attach or detach CLI fails ?
Because attach and detach are triggering 2 Nomad tx to lock and unlock (discovery/prepare/commit) and replaying would cause 2 problems:
un-necessary append-log entries would fill the append-log
un-necessary nomad transactions triggered would increase the chance to collide with a concurrent user transaction which aims are doing a valid config change or repair
Repairing
We can re-allow a scale op to be retried by running:
Overview
config-tool repair -force allow_scaling
Scale in flow
detach command (CLI)
deny scale in
markerlock
deny scale in
to prevent replaying the detachunlock
on rebalancing success (server-side)
detach
the stripeunlock
on nomad tx successon rebalancing failure (server-side)
deny scale in
markerunlock
Scale out flow
attach command (CLI)
deny scale out
markerlock
attach
deny scale out
to prevent replaying the attachunlock
on rebalancing success (server-side)
unlock
on rebalancing failure (server-side)
detach
deny scale out
markerunlock
Why adding a marker if the
attach
ordetach
CLI fails ?Because attach and detach are triggering 2 Nomad tx to lock and unlock (discovery/prepare/commit) and replaying would cause 2 problems:
Repairing
We can re-allow a scale op to be retried by running:
config-tool repair -force allow_scaling