elodina / datastax-enterprise-mesos

DataStax Enterprise on Mesos
http://www.elodina.net
15 stars 4 forks source link

Node failover #76

Closed olegkovalenko closed 8 years ago

olegkovalenko commented 8 years ago

MOTIVATION:

C* node could fail due to misconfiguration, memory, etc reasons. In order to mitigate such failures and be able to handle them in automated way lets introduce failover delay, max delay and max tries.

PROPOSED CHANGE:

When a node fails, DSE mesos scheduler assumes that the failure is recoverable. The scheduler will try to restart the node after waiting failover-delay (i.e. 30s, 2m). The initial waiting delay is equal to failover-delay setting. After each consecutive failure this delay is doubled until it reaches failover-max-delay value.

If failover-max-tries is defined and the consecutive failure count exceeds it, the node will be deactivated.

The following failover settings exists:

--failover-delay     - initial failover delay to wait after failure (option value is required)
--failover-max-delay - max failover delay (option value is required)
--failover-max-tries - max failover tries to deactivate broker (to reset to unbound pass --failover-max-tries "")

CLI changes: node add and node update will allow to configure --failover-delay , --failover-max-delay , --failover-max-tries

Http server changes: /api/node/add and /api/node/update will allow to configure failoverDelay, failoverMaxDelay, failoverMaxTries

Scheduler changes:

C* storage changes:

add ability to store failover, introduce columns:

  node_failover_delay text,
  node_failover_max_delay text,
  node_failover_max_tries int,
  node_failover_failures int,
  node_failover_failure_time timestamp

RESULT: failover with increased delay and ability to stop node after max tries (fixes #28)

dmitrypekar commented 8 years ago

Everything looks good, except one point: imho, it would be much better to split following test methods:

I will also do a testing now.

dmitrypekar commented 8 years ago

Thanks for the update! Merged.

olegkovalenko commented 8 years ago

Thanks!