cloudstax / firecamp

Serverless Platform for the stateful services
https://www.cloudstax.io
Apache License 2.0
208 stars 21 forks source link

Unable to start kafka service #81

Closed jonathanmv closed 5 years ago

jonathanmv commented 5 years ago

Hi, thanks for this amazing project.

I was able to create a stack and install the zookeeper service, however the kafka service is presenting some issues.

The command I'm using to create the kafka service is below

firecamp-service-cli -op=create-service -service-type=kafka -region=us-east-1 -cluster=ecs-cluster  -replicas=3 -volume-size=1 -service-name=kafka -kafka-zk-service=zookeeper -kafka-heap-size=512

I'm using small numbers in the heap-size and volume because I'm just testing the firecamp tool.

After executing the command to create the service I start getting messages like

2018-10-21 20:27:32.814535919 +0000 UTC wait the service containers running, RunningCount 0

Until finally it times out

2018-10-21 20:22:26.943065188 +0000 UTC not all service containers are running after 5m0s

I am able to connect to the zookeeper hosts and run commands with zookeeper-shell.sh.

If I check the status running firecamp-service-cli -op=check-service-init it says that the service is initialized. I can see that the DNS mapping and then EBS Volumes have been created. I can also run -op=get-service and it says ServiceStatus:ACTIVE

I tried deleting the service and creating it again but to no avail.

Tomorrow I will be trying to recreate the stack to see if in the new stack I don't have that issue.

Is there any step that I am missing or doing wrongly? I am following the tutorial

Please let me know if there's some other info you need from me. Thanks again

JuniusLuo commented 5 years ago

What is the total memory of EC2 instance? What is the heap size configured for ZooKeeper? Could you please also get the detail logs of the management service from cloudwatch?

jonathanmv commented 5 years ago

I was using t2.small instances for the nodes, they have 2Gb of memory. The heap size was 512Mb for both zookeeper and kafka. I can't share the logs because I deleted the stack and all the logs as well

jazzl0ver commented 5 years ago

I had similar issues with small instances. Try t2.large.

JuniusLuo commented 5 years ago

Interesting. I also used t2.small in simple test sometimes, and did not see this problem. You could try t2.large. If you tried t2.small and hit this bug again, please help to collect the logs from cloudwatch.

jonathanmv commented 5 years ago

Ok, I recreated the cluster with t2.small instances and I am still getting the same problem. The kafka service is unable to start regardless of the heap size memory.

JuniusLuo commented 5 years ago

can you attach the logs of filecamp management service?

jonathanmv commented 5 years ago

firecamp-manageserver-logs.txt Here are the logs of a new cluster created. This one has instances t2.medium and I was unable to create the zookeeper service.

In the logs you may find that I deleted the service and tried to create it again but to no avail.

How are you configuring the stacks? I haven't been able to use this cloudformation template unfortunately

JuniusLuo commented 5 years ago

looks zk container could not be started. this sounds weird. never met this before.

could you please get the firecamp volume log from one of the node? /var/log/firecamp/firecamp-dockervolume.*

also if you could attach one zk log, it would help for the debugging. The zk log is in cloudwatch.

JuniusLuo commented 5 years ago

how did you provision the system? are you using aws quickstart, or provision by yourself? Did you specify the correct availability zones and the corresponding subnets?

jonathanmv commented 5 years ago

Sorry guys, I'm unable to check the logs or spend more time looking into this issue.