Would be great to embed this steps into scheduler.
NOTE: rolling restart works only for cases when data-file-dirs and commit-log-dir are outside of sandbox
when node has data-file-dirs and commit-log-dir inside of sandbox once it stopped and then
started, during start node doesn't find any data thus thinks that its some other node not that
node that was on this address, thus halts. Thus if you want to have rolling restart specify data-file-dirs and commit-log-dir outside of sandbox.
PROPOSED CHANGE:
CLI changes:
add `node restart <node-expr>` command
command will send HTTP POST request to HTTP API, HTTP server will stop then start
each node (one by one, sequentially, proceed to next only when previous restarted),
when timeout occurs process stops and returns JSON describing status "timeout" and
message "node $id timeout on <stop|start>".
usage example:
./dse-mesos.sh node update 0..9 --data-file-dirs /sstable/xvdv,/sstable/xvdw,/sstable/xvdx
NOTE: allow ability to update running node (currently only idle node can be udpated)
./dse-mesos.sh node restart 0..9
restarted
timeout output will be:
./dse-mesos.sh node restart 0..9
Error: node 0 timeout on stop
help node restart:
./dse-mesos.sh help node restart
Start node
Usage: node start [options]
Option Description
--timeout Time to wait until node restart. Should
be a parsable Scala Duration value.
Defaults to 4m.
cli `node list` affected by change, because in order to communicate back to user that node
has been modified but not restarted flag modified will be added to Node model whenever
node is modified via `node update` cmd flag will be set to `true` (default value is `false`),
once node stopped (task update will be received in onTaskStopped) flag `modfied` will be set to `false`.
Thus if node has been udpated flag `modified` will be set to `true`, once `onTaskStopped` is
called `modified` will be set to `false`.
when node `idle`, flag `modified` aren't shown to user
MOTIVATION: Rolling restart manually:
Would be great to embed this steps into scheduler.
NOTE: rolling restart works only for cases when data-file-dirs and commit-log-dir are outside of sandbox when node has data-file-dirs and commit-log-dir inside of sandbox once it stopped and then started, during start node doesn't find any data thus thinks that its some other node not that node that was on this address, thus halts. Thus if you want to have
rolling restart
specifydata-file-dirs
andcommit-log-dir
outside of sandbox.PROPOSED CHANGE:
CLI changes:
./dse-mesos.sh node restart 0..9
restarted
./dse-mesos.sh node restart 0..9
Error: node 0 timeout on stop
Start node Usage: node start [options]
Option Description
--timeout Time to wait until node restart. Should be a parsable Scala Duration value. Defaults to 4m.
./dse-mesos.sh node list
node: id: 0 state: idle topology: cluster:default, dc:default, rack:default resources: cpu:0.5, mem:512 seed: false stickiness: period:30m
./dse-mesos.sh node list node: id: 0 state: running modified: has pending update ...
{ "id": "0", ... "modified": true }
HTTP API changes:
{ "status": "restarted", "nodes": [ { "id": "0", ... }... ] }
{ "status": "timeout", "message": "node $id timeout on [start|stop]" }
Scheduler changes:
Supports
cassansra storage
:RESULT: easy way to restart nodes