Closed jazzl0ver closed 6 years ago
Not hit this issue before. Could you please post the manage server log?
Please retry the command. The cassandra service init task may not be executed.
Tried one more time with the latest release - same issue.
Update: this happens when replicas is set to 1
The server side log looks good. The Cassandra service is successfully created and initialized. The EOF error looks like connection broken. If it is easy to reproduce, could you please collect the network trace? Added some more logs. Please help to retry with the latest cli. Thanks!
# ./firecamp-service-cli -op=create-service -service-type=cassandra -region=us-east-1 -cluster=test-fc -service-name=cass-test-fc -replicas=1 -volume-size=10 -journal-volume-size=1 -volume-encrypted=true -journal-volume-encrypted=true -cas-heap-size=512
the heap size is less than 8192. Please increase it for production system
the heap size is lessn than 1024, Cassandra JVM may stall long time at GC
2018-02-07 09:42:28.1350098 +0000 UTC create cassandra service error EOF
Just in case you might want to take a look into a traffic dump:
09:42:22.506326 IP 172.22.2.56.36604 > 172.22.1.72.27040: Flags [P.], seq 1:592, ack 1, win 141, options [nop,nop,TS val 2970911164 ecr 75981], length 591
E.....@.@......8...H..i..A.U........^".....
......(.PUT /?Catalog-Create-Cassandra HTTP/1.1
Host: firecamp-manageserver.test-fc-firecamp.com:27040
User-Agent: Go-http-client/1.1
Content-Length: 416
Accept-Encoding: gzip
{"Service":{"Region":"us-east-1","Cluster":"test-fc","ServiceName":"cass-test-fc"},"Resource":{"MaxCPUUnits":0,"ReserveCPUUnits":256,"MaxMemMB":0,"ReserveMemMB":256},"Options":{"Replicas":1,"Volume":{"VolumeType":"gp2","VolumeSizeGB":10,"Iops":100,"Encrypted":true},"JournalVolume":{"VolumeType":"gp2","VolumeSizeGB":1,"Iops":0,"Encrypted":true},"HeapSizeMB":512,"JmxRemoteUser":"cassandrajmx","JmxRemotePasswd":""}}
09:42:22.506906 IP 172.22.1.72.27040 > 172.22.2.56.36604: Flags [.], ack 592, win 236, options [nop,nop,TS val 75981 ecr 2970911164], length 0
E..4.*@...K....H...8i........A......._.....
..(.....
09:42:28.134767 IP 172.22.1.72.27040 > 172.22.2.56.36604: Flags [P.], seq 1:177, ack 592, win 236, options [nop,nop,TS val 77388 ecr 2970911164], length 176
E....+@...K;...H...8i........A......`......
...L....HTTP/1.1 200 OK
Content-Type: application/json
Server: firecamp
X-Requestid: req-dfe98f04049f41f56678f1951e70036c
Date: Wed, 07 Feb 2018 09:42:28 GMT
Content-Length: 0
Thanks! Could you please upload the manage service log as well?
Thanks! Found one possible bug. Let me test the fix.
The fix was committed. Please see if it works at your env. Simply stop the firecamp-manageserver task at ECS console. ECS will pull the latest manageserver docker image.
Now it looks like this:
# ./firecamp-service-cli -op=create-service -service-type=cassandra -region=us-east-1 -cluster=test-fc -service-name=cass-test-fc -replicas=1 -volume-size=10 -journal-volume-size=1 -volume-encrypted=true -journal-volume-encrypted=true -cas-heap-size=512
the heap size is less than 8192. Please increase it for production system
the heap size is lessn than 1024, Cassandra JVM may stall long time at GC
2018-02-07 17:54:15.393241841 +0000 UTC The catalog service is created, jmx user cassandrajmx password fe4868edb5b640ca555e29853594acb8
2018-02-07 17:54:15.393277702 +0000 UTC wait till the service gets initialized
2018-02-07 17:54:15.411316367 +0000 UTC All service containers are running, RunningCount 0
and Cassandra is not running. firecamp-manager.log.gz
the output looks weird. how could the RunningCount be 0? Could you please retry the request?
Yeah, next iteration has started it up. Thank you!
Checked the manager server log you attached. The service is successfully created. Looks it is a timing window issue. ECS may return 0 desired count for the service right after the service is created. We could add a check at cli. If the desired count is 0, wait and retry. Will add a patch.
committed a patch.
Cassandra service is starting though w/o issues. No such issues with Zookeeper. The cli and manageserver are the latest.