Rolling restart - Githubissues

olegkovalenko commented 8 years ago

MOTIVATION: Rolling restart manually:

Stop one node
Update its configurations
Start this one node
Repeat for next node

Would be great to embed this steps into scheduler.

NOTE: rolling restart works only for cases when data-file-dirs and commit-log-dir are outside of sandbox when node has data-file-dirs and commit-log-dir inside of sandbox once it stopped and then started, during start node doesn't find any data thus thinks that its some other node not that node that was on this address, thus halts. Thus if you want to have rolling restart specify data-file-dirs and commit-log-dir outside of sandbox.

PROPOSED CHANGE:

CLI changes:

add `node restart <node-expr>` command

command will send HTTP POST request to HTTP API, HTTP server will stop then start
each node (one by one, sequentially, proceed to next only when previous restarted),
when timeout occurs process stops and returns JSON describing status "timeout" and
message "node $id timeout on <stop|start>".

usage example:

  ./dse-mesos.sh node update 0..9 --data-file-dirs /sstable/xvdv,/sstable/xvdw,/sstable/xvdx

  NOTE: allow ability to update running node (currently only idle node can be udpated)

./dse-mesos.sh node restart 0..9

restarted


  timeout output will be:

./dse-mesos.sh node restart 0..9

Error: node 0 timeout on stop


help node restart:

  ./dse-mesos.sh help node restart

Start node Usage: node start [options]

Option Description

--timeout Time to wait until node restart. Should be a parsable Scala Duration value. Defaults to 4m.


cli `node list` affected by change, because in order to communicate back to user that node
has been modified but not restarted flag modified will be added to Node model whenever
node is modified via `node update` cmd flag will be set to `true` (default value is `false`),
once node stopped (task update will be received in onTaskStopped) flag `modfied` will be set to `false`.
Thus if node has been udpated flag `modified` will be set to `true`, once `onTaskStopped` is
called `modified` will be set to `false`.

when node `idle`, flag `modified` aren't shown to user

./dse-mesos.sh node list

node: id: 0 state: idle topology: cluster:default, dc:default, rack:default resources: cpu:0.5, mem:512 seed: false stickiness: period:30m


when node `running`, `starting`, `stopping`, `reconciling`

./dse-mesos.sh node list node: id: 0 state: running modified: has pending update ...


Node JSON representation changes: added `modified` flag

{ "id": "0", ... "modified": true }

HTTP API changes:

add call for restart
POST /node/restart
parameters:
  node
  timeout

response format for success:

{ "status": "restarted", "nodes": [ { "id": "0", ... }... ] }


response format for timeout:

{ "status": "timeout", "message": "node $id timeout on [start|stop]" }

Scheduler changes:

update attribute `modified` set to `false` when invoked `onTaskStopped`

Supports cassansra storage:

persists `modfied` flag

RESULT: easy way to restart nodes

dmitrypekar commented 8 years ago

Merged. Thank you for update.

olegkovalenko commented 8 years ago

Thanks for review and feedback !

elodina / datastax-enterprise-mesos

Rolling restart #67