cloudstax / firecamp

Serverless Platform for the stateful services
https://www.cloudstax.io
Apache License 2.0
210 stars 20 forks source link

Kafkamanager does not come up #42

Closed jazzl0ver closed 6 years ago

jazzl0ver commented 6 years ago

Created new firecamp cluster from scratch (used firecamp.template). Tried to start up kafkamanager service:

./firecamp-service-cli -cluster=firecamp-qa -region=us-east-1 -op=create-service -service-type=kafkamanager -service-name=kafkamanager-qa -km-heap-size=512 -km-zk-service=zoo-qa -km-user=user -km-passwd=pass
The Kafka Manager heap size is less than 4096. Please increase it for production system
2018-03-05 15:18:48.889010183 +0000 UTC The kafka manager service is created, wait for all containers running
2018-03-05 15:18:48.929875163 +0000 UTC wait the service containers running, RunningCount 0
...
2018-03-05 15:23:50.807640377 +0000 UTC not all service containers are running after 5m0s

firecamp-managesever log:

I0305 15:18:48.631132 1 route53.go:146] find hosted zone /hostedzone/ZL36HKC3OHITW for domain firecamp-qa-firecamp.com vpc vpc-d44e1eb1 us-east-1 requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.631143 1 route53.go:58] get hostedZoneID /hostedzone/ZL36HKC3OHITW for domain firecamp-qa-firecamp.com vpc vpc-d44e1eb1 us-east-1 requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.631154 1 service.go:135] get hostedZoneID /hostedzone/ZL36HKC3OHITW for domain firecamp-qa-firecamp.com vpc vpc-d44e1eb1 requuid req-649f1bce27b440564fceda5f5d983ae6 &{us-east-1 firecamp-qa kafkamanager-qa stateless}
I0305 15:18:48.638979 1 dynamodb_service.go:44] created service &{firecamp-qa kafkamanager-qa 4615cc0394144d224729850cfe4db686} requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.639002 1 service.go:695] created service &{firecamp-qa kafkamanager-qa 4615cc0394144d224729850cfe4db686} requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.644111 1 dynamodb_serviceattr.go:107] created service attr &{4615cc0394144d224729850cfe4db686 CREATING 1520263128639010452 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless} requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.644147 1 service.go:798] created service attr in db &{4615cc0394144d224729850cfe4db686 CREATING 1520263128639010452 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless} requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.644197 1 service.go:166] created service attr, requuid req-649f1bce27b440564fceda5f5d983ae6 &{4615cc0394144d224729850cfe4db686 CREATING 1520263128639010452 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless}
I0305 15:18:48.644241 1 dynamodb_serviceattr.go:144] update service status from CREATING to INITIALIZING requuid req-649f1bce27b440564fceda5f5d983ae6 &{4615cc0394144d224729850cfe4db686 INITIALIZING 1520263128644215963 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless}
I0305 15:18:48.649431 1 dynamodb_serviceattr.go:216] updated service attr &{4615cc0394144d224729850cfe4db686 CREATING 1520263128639010452 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless} to &{4615cc0394144d224729850cfe4db686 INITIALIZING 1520263128644215963 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless} requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.649471 1 service.go:185] successfully created service, requuid req-649f1bce27b440564fceda5f5d983ae6 &{4615cc0394144d224729850cfe4db686 INITIALIZING 1520263128644215963 1 firecamp-qa kafkamanager-qa { { 0 0 false} { 0 0 false}} true firecamp-qa-firecamp.com /hostedzone/ZL36HKC3OHITW false <nil> {0 256 0 512} stateless}
I0305 15:18:48.701299 1 cloudwatch.go:152] created log group firecamp-qa-kafkamanager-qa-4615cc0394144d224729850cfe4db686 requuid req-649f1bce27b440564fceda5f5d983ae6
I0305 15:18:48.734275 1 ecs.go:294] service is inactive kafkamanager-qa cluster firecamp-qa
I0305 15:18:48.760506 1 ecs.go:341] ListTaskDefinitionFamilies prefix firecamp-qa-kafkamanager-qa token <nil> resp {
Families: ["firecamp-qa-kafkamanager-qa"]
}

ECS console displays this error:

Status reason | CannotStartContainerError:  API error (500): failed to initialize logging driver:  ResourceNotFoundException: The specified log group does not exist.   status code: 400, request id: ed451ab0-2088-11e8-a5e4-6f3c66053865

Looks like the service is still trying to create a log group in the outdated format:

 "requestParameters": { "logGroupName": "firecamp-firecamp-qa-kafkamanager-qa-b5291cb97e624299744ef6d9b9ce5ad9", "logStreamName": "kafkamanager-qa/firecamp-qa-kafkamanager-qa-container/544c5f99-e1e9-46b1-b50d-fe05a91aaaf7" },
JuniusLuo commented 6 years ago

which version did you install? For the latest release, are you using the https://s3.amazonaws.com/cloudstax/firecamp/releases/latest/templates/firecamp.template? Or your own bucket?

jazzl0ver commented 6 years ago

I installed the latest. Used firecamp.template from the project:

$ md5sum firecamp.template
9343fb2df76cfb85ed22ccea681a36fe  firecamp.template
JuniusLuo commented 6 years ago

yes, that is the latest template. Are you using your own bucket? If so, did you build the new manage image and upload to your bucket? Or you are simply using cloudstax bucket?

jazzl0ver commented 6 years ago

Yes, I put that template in my S3 bucket and specify it in the CloudFormation form. No, I didn't build any new images. Just git pull'ed from your repo and uploaded to the bucket.

JuniusLuo commented 6 years ago

Could you please get the docker image version of the manageserver? find the node the manageserver container is running, and run docker images --digests. also run docker plugin ls. Thanks!

jazzl0ver commented 6 years ago
[root@ip-172-22-5-35 ec2-user]# docker images --digests
REPOSITORY                            TAG                 DIGEST                                                                    IMAGE ID            CREATED             SIZE
cloudstax/firecamp-manageserver       latest              sha256:486ae9288c7700d6ba3c6fe2d6f0e517debfb52e4a68a8f8f1c5ec2b83df69c3   3e32a7e9bc6a        16 hours ago        149MB
cloudstax/firecamp-kafka              1.0                 sha256:3e8c9a5c040c3df152bb4c9f36bf4bc45c991a1c0d75844d655b2aa1019a568e   60e845f96a27        2 days ago          132MB
cloudstax/firecamp-kafka-manager      1.3.3               sha256:4729587c2ed46a9a74b229ffaebb59f6066a731021a19fecbfa02d444e3e5f34   f4c015b25c38        6 days ago          382MB
cloudstax/firecamp-cassandra          3.11                sha256:3f17c100e112d7868645be79b0ea5394e4e9d506960f1c8158d1ba6cb25087e5   035d6748a9c9        2 weeks ago         320MB
cloudstax/firecamp-zookeeper          3.4                 sha256:ce8622fe8b8dfa1951f3bde63b637621b7ee5a049ec7715ccc756d95ee090cec   daa228143351        3 weeks ago         145MB
cloudstax/firecamp-amazon-ecs-agent   latest              sha256:f87d4c9f901f241502ddf5ab97da34bb8f7993121b324d8183794ca7c0b31196   c197350df554        5 weeks ago         26.9MB
[root@ip-172-22-5-35 ec2-user]# docker plugin ls
ID                  NAME                               DESCRIPTION                                     ENABLED
dea8905f6c44        cloudstax/firecamp-volume:latest   firecamp volume plugin for docker               true
07ac0c9fca04        cloudstax/firecamp-log:latest      firecamp log plugin for docker: consume lo...   true
JuniusLuo commented 6 years ago

Thanks! The env looks good.

The manage server actually created the correct log group.

I0305 15:18:48.701299 1 cloudwatch.go:152] created log group firecamp-qa-kafkamanager-qa-4615cc0394144d224729850cfe4db686 requuid req-649f1bce27b440564fceda5f5d983ae6

Where did you get below log?

"requestParameters": { "logGroupName": "firecamp-firecamp-qa-kafkamanager-qa-b5291cb97e624299744ef6d9b9ce5ad9", "logStreamName": "kafkamanager-qa/firecamp-qa-kafkamanager-qa-container/544c5f99-e1e9-46b1-b50d-fe05a91aaaf7" },
JuniusLuo commented 6 years ago

Just deployed a latest cluster and kafkamanager worked well. Beside firecamp.template, did you upload all latest templates such as firecamp-ecs.template? and the init.sh script?

jazzl0ver commented 6 years ago

Yes, I put that template in my S3 bucket and specify it in the CloudFormation form. No, I didn't build any new images. Just git pull'ed from your repo and uploaded to the bucket. No, I didn't upload any other templates nor init script. Should I? I didn't do that before..

Where did you get below log?

It's from CloudTrails

jazzl0ver commented 6 years ago

This is ridiculous. Last night I deleted kafkamanager service and today issued a command to create it again. And it went w/o any issues.. I'm sorry. Next time I will increase a number of attempts before reaching you.

JuniusLuo commented 6 years ago

Originally, thought you changed QSS3BucketName and QSS3KeyPrefix to your own bucket. If not, you don't have to upload other templates or init script.

It is weird. Good to know it works now. Please feel free to report it if you hit it again. We would like to further investigate any potential issue. Thanks!

jazzl0ver commented 6 years ago

Just figured this out. That cluster was created during that short period when you changed loggroup name from clustername-firecamp... to firecamp-clustername-firecamp... So, when I attempted to create a service it used the very 1st task definition which contained the latter loggroup name. After I deleted the service along with its task definition, the new service creation went w/o issues.