cloudstax / firecamp

Serverless Platform for the stateful services
https://www.cloudstax.io
Apache License 2.0
209 stars 20 forks source link

can't start zookeeper in 0.9.2 #18

Closed jazzl0ver closed 6 years ago

jazzl0ver commented 6 years ago

I'm sorry for bothering you, but this 0.9.2 release is a headache for me. Can you please check if you can start zookeeper in ECS with the following command:

# ./firecamp-service-cli -op=create-service -service-type=zookeeper -region=us-east-1 -cluster=firecamp-prod -service-name=zoo-prod -replicas=3 -volume-size=20 -zk-heap-size=512

I'm getting:

The ZooKeeper heap size is less than 4096. Please increase it for production system
The zookeeper service is created, wait for all containers running
wait the service containers running, RunningCount 0
...
wait the service containers running, RunningCount 1
not all service containers are running after 120

And finally have one zookeeper container running only. Service events show:

85171af8-094f-48c8-95c1-8ddc1406cfd3
2018-01-25 20:51:55 +0300
service zoo-prod was unable to place a task because no container instance met all of its requirements. The closest matching container-instance 986e672b-838a-4215-94c0-1ae8d8cf783b encountered error "memberOf constraint unsatisfied". For more information, see the Troubleshooting section.

cec4fbb1-b192-48d1-8747-a34c160a8481
2018-01-25 20:51:42 +0300
service zoo-prod has started 1 tasks: task d4e9c873-4629-4688-8b5b-9f8b1fcda874.

Firecamp log ends up with:

...
I0125 17:54:39.218207 1 server.go:688] get service status &{1 3} requuid req-722d99f1ef0c470c463dee0fe2e1dfea &{us-east-1 firecamp-prod zoo-prod}
I0125 17:54:44.219742 1 server.go:105] request Method GET URL /?Get-Service-Status ?Get-Service-Status Host firecamp-manageserver.firecamp-prod-firecamp.com:27040 requuid req-9a1353ee054041c96c770a55a24813c3 headers map[Accept-Encoding:[gzip] User-Agent:[Go-http-client/1.1] Content-Length:[73]]
I0125 17:54:44.236612 1 ecs.go:759] service zoo-prod has 1 running containers, desired 3
I0125 17:54:44.236634 1 server.go:688] get service status &{1 3} requuid req-9a1353ee054041c96c770a55a24813c3 &{us-east-1 firecamp-prod zoo-prod}
I0125 17:54:49.238279 1 server.go:105] request Method GET URL /?Get-Service-Status ?Get-Service-Status Host firecamp-manageserver.firecamp-prod-firecamp.com:27040 requuid req-97454a904f2b4c8161b8cf499e72d06a headers map[User-Agent:[Go-http-client/1.1] Content-Length:[73] Accept-Encoding:[gzip]]
I0125 17:54:49.256414 1 ecs.go:759] service zoo-prod has 1 running containers, desired 3
I0125 17:54:49.256441 1 server.go:688] get service status &{1 3} requuid req-97454a904f2b4c8161b8cf499e72d06a &{us-east-1 firecamp-prod zoo-prod}

Any ideas what's going on?

JuniusLuo commented 6 years ago

Could you please check the EC2 instance type and the number of availability zones? Did you deploy other services?

jazzl0ver commented 6 years ago

Instance type - t2.medium AZs are from a to e (no f) No other services. Just a clean firecamp env

On Jan 25, 2018 21:18, "JuniusLuo" notifications@github.com wrote:

Could you please check the EC2 instance type and the number of availability zones? Did you deploy other services?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cloudstax/firecamp/issues/18#issuecomment-360553533, or mute the thread https://github.com/notifications/unsubscribe-auth/ACVLj8VJMjUyS-jkIdxLCPNZB5uZhOSqks5tOMV3gaJpZM4RtQ4S .

JuniusLuo commented 6 years ago

Thanks. How many nodes in the cluster? 5 or 3?

jazzl0ver commented 6 years ago

3 nodes

On Jan 25, 2018 21:27, "JuniusLuo" notifications@github.com wrote:

Thanks. How many nodes in the cluster? 5 or 3?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cloudstax/firecamp/issues/18#issuecomment-360556165, or mute the thread https://github.com/notifications/unsubscribe-auth/ACVLj6MEtSfmVni9o0p2e3HqlWIL4wsrks5tOMekgaJpZM4RtQ4S .

JuniusLuo commented 6 years ago

This might be the issue. Currently when creating the service, FireCamp does not check if there is node running in one zone. FireCamp manage service simply assigns the service replicas to zones in the round-robin mode. So the replica may be assigned to the zone that no node is running.

Is there any reason that you want the cluster over 5 zones while has only 3 nodes? If you have 5 nodes on 5 zones or 3 nodes on 3 zones, this issue would not show up.

jazzl0ver commented 6 years ago

The main reason was future scaling. I thought having configured multiple AZs might be used when we need to increase a number of instances - they will be created in that AZs. I'll try tomorrow to equalize AZs and instances amount. Thank you for your help!

On Thu, Jan 25, 2018 at 9:44 PM, JuniusLuo notifications@github.com wrote:

This might be the issue. Currently when creating the service, FireCamp does not check if there is node running in one zone. FireCamp manage service simply assigns the service replicas to zones in the round-robin mode. So the replica may be assigned to the zone that no node is running.

Is there any reason that you want the cluster over 5 zones while has only 3 nodes? If you have 5 nodes on 5 zones or 3 nodes on 3 zones, this issue would not show up.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cloudstax/firecamp/issues/18#issuecomment-360560634, or mute the thread https://github.com/notifications/unsubscribe-auth/ACVLjyRAxVnb520nl_jmqVnxk69UexOTks5tOMt0gaJpZM4RtQ4S .

JuniusLuo commented 6 years ago

How do you want to scale? Want to scale ZooKeeper to 5 nodes on 5 AZs?

There is one limitation by AutoScalingGroup and EBS. If the cluster has 5 AZs and 3 instances, ASG may create the new instance in the 4th AZ when one instance goes down. But the previous EBS volume is not in the 4th AZ. So one member will fail to start.

You could start with 3 AZs and 3 instances. In the future release, we will support scaling the AZs. We could add the new AZs to the ASG, and update the new AZs to the FireCamp manage service. The manage service will create the new replicas in the new AZs when scaling the ZooKeeper service.

jazzl0ver commented 6 years ago

After shrinking AZs number to 3, everything worked like a charm! Thank you!

cloudstax commented 6 years ago

close this issue, as it works with the correct number of nodes. Scaling the cluster is an advanced feature in the later release.