Closed jazzl0ver closed 6 years ago
Could you please share more about the use case? Why do you want to restart a service? Do you want a rolling restart? The rolling restart restarts one member at one time, checks the cluster status, and then restarts the next member. Or it is simply stopping the service when the system is idle, and then starting it again when the load arrives?
Actually, this came from our developers who had some strange issues with Kafka and asked me to restart the service (since this is the easiest way to narrow down the issue from their point of view). Also I can imagine that due to a software bug or something a Kafka (or other app) container might just get hung and needed to killed and restarted. And it would be great to have a legitimate way to do that. So, both of restart types would be awesome to have!
Thanks for sharing the details! The rolling restart is not supported yet.
For stopping all members and then starting again, could simply use the stop-service and start-service command. Stop-service will stop all containers of one service. Start-service will start them again. Example: firecamp-service-cli -cluser=mycluster -region=us-east-1 -op=stop-service/start-service -service-name=mykafka While, stop-service simply stops all containers. When a container is stopped, the SIGTERM is sent to the main process. kafka-server-stop.sh also sends the SIGTERM to kafka process. So simply stopping the container will be safe for kafka. We will enhance to stop the members one by one.
# ./firecamp-service-cli -cluster=firecamp-qa -region=us-east-1 -op=stop-service -service-name=kafka-qa
StopService error InvalidArgs: Bad Request
the cli version is the latest (0.9.2)
Wonder how could this happen. Is server version is also 0.9.2?
oops.. it's definitely not. how to upgrade the server to 0.9.2?
For now, you would have to manually upgrade. Will write down the upgrade procedure.
Just a check. Which version are you currently running?
wondering how to check that..
login to any node, run: sudo docker plugin ls. It will show the firecamp volume plugin version. This will be the version the cluster is at.
# sudo docker plugin ls
ID NAME DESCRIPTION ENABLED
6afd9c9e340b cloudstax/firecamp-volume:latest firecamp volume plugin for docker true
It was started on Dec 10th.
You are using the latest version, which is the master branch. not the release branch. The master branch is under active development. Please do not use it for production. Would be ok for testing, while, things may be broken. It would be better to use the release.
There is no need to upgrade. You could use the latest cli, https://s3.amazonaws.com/cloudstax/firecamp/releases/latest/packages/firecamp-service-cli.tgz, to stop service.
Same thing:
[root@dev ~]# wget https://s3.amazonaws.com/cloudstax/firecamp/releases/latest/packages/firecamp-service-cli.tgz
--2018-01-18 17:19:25-- https://s3.amazonaws.com/cloudstax/firecamp/releases/latest/packages/firecamp-service-cli.tgz
Resolving s3.amazonaws.com... 52.216.96.21
Connecting to s3.amazonaws.com|52.216.96.21|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2487489 (2.4M) [application/gzip]
Saving to: “firecamp-service-cli.tgz”
100%[==========================================================================================>] 2,487,489 --.-K/s in 0.03s
2018-01-18 17:19:25 (74.2 MB/s) - “firecamp-service-cli.tgz” saved [2487489/2487489]
[root@dev ~]# tar zxf firecamp-service-cli.tgz
[root@dev ~]# ./firecamp-service-cli -cluster=firecamp-qa -region=us-east-1 -service-name=kafka-qa -op=stop-service
StopService error InvalidArgs: Bad Request
Weird. Let me try it.
Just set up a testbed, created a service, stopped and started correctly. Probably the manage service docker image is a little too old on your testbed. Could you please try to get the latest manage service docker image? And try stop-service again?
To get the latest manage service docker image.
docker service update firecamp-manageserver --image cloudstax/firecamp-manageserver:latest
. This will pull the latest docker image and restart the manageserver container.That did the trick, thank you!
Cool!
Just one additional question: are you using ECS or Swarm?
ECS
Added the initial rolling restart support. The service containers will be deleted and recreated one by one. Only one container is deleted at one time. And after the container is recreated, the next container will be deleted. The highest replica will be restarted first. You could try the new restart-service cli. Note: restart-service is different with stop-service and start-service. restart-service will do the rolling restart. stop/start-service simply stops/starts all service containers in parallel.
Thanks a bunch! I'll let you know if I face any issues
Thanks!
Close this issue. If you find any bug, please reopen it or create a new issue. Thanks!
Hello. This might be a dumb question, I apologies. What is the correct way to restart a service (in terms of the cluster). For example, how to restart Kafka containers safely?