Open ConnorDoyle opened 9 years ago
I think this an okay idea, though it may cause an issue for cassandra which configures framework name based on the cluster name.
For example the user can specify cluster.name = prod
which will result in a framework name in mesos of cassandra.prod
Based on how the DCOS UI maps registered frameworks to Marathon processes, there's no way around this (otherwise your service has 0 chance of appearing "healthy" in the UI). Since that's the case, it should be required to avoid massive confusion.
The alternative for the UI is to change how that mapping is done. That's outside the scope of this repo though. cc @mlunoe && @rcorral.
FYI, We also need framework-name to be there given how uninstall works.
Good point.
Just for the record, if there's a DCOS_PACKAGE_FRAMEWORK_NAME
we assume that the marathon app is a framework.
@rcorral -- understood. That's something that doesn't live in this repo though (it's added as a label to the Marathon app), so the purpose of this required config parameter is to have a uniform way to communicate the value to the client (dcos-cli).
I'm spinning up a cluster with 3 instances of the cassandra framework running so that we can test some of these things out before I roll it back.
(removed)
In the case of cassandra I think think things are handled rather well.
I created a cluster and installed 3 different instances of the cassandra service.
cassandra.cluster-name
to be test1
cassandra.cluster-name
to be test2
As can be seen in the screenshot below all three instances started by marathon cleanly:
It can also be seen that the health check correlation is happening correctly, and was verified using the following script:
for s in dcos test1 test2; do http --print=HhBb --pretty=colors http://benw-19-elasticloa-a0ax8we91ngl-695045825.us-west-2.elb.amazonaws.com/service/cassandra.$s/health/cluster/report;done
Which output:
GET /service/cassandra.dcos/health/cluster/report HTTP/1.1
User-Agent: HTTPie/0.9.0
Accept: */*
Connection: keep-alive
Accept-Encoding: gzip, deflate
Host: benw-19-elasticloa-a0ax8we91ngl-695045825.us-west-2.elb.amazonaws.com
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 03 Jun 2015 00:20:12 GMT
Server: openresty/1.7.10.1
Content-Length: 592
Connection: keep-alive
{"healthy":true,"results":[{"name":"nodeCount","ok":true,"expected":3,"actual":3},{"name":"seedCount","ok":true,"expected":2,"actual":2},{"name":"allHealthy","ok":true,"expected":[true,true,true],"actual":[true,true,true]},{"name":"operatingModeNormal","ok":true,"expected":["NORMAL","NORMAL","NORMAL"],"actual":["NORMAL","NORMAL","NORMAL"]},{"name":"lastHealthCheckNewerThan","ok":true,"expected":[1433290512948,1433290512948,1433290512948],"actual":[1433290763791,1433290765815,1433290765998]},{"name":"nodesHaveServerTask","ok":true,"expected":[true,true,true],"actual":[true,true,true]}]}
GET /service/cassandra.test1/health/cluster/report HTTP/1.1
User-Agent: HTTPie/0.9.0
Accept: */*
Connection: keep-alive
Accept-Encoding: gzip, deflate
Host: benw-19-elasticloa-a0ax8we91ngl-695045825.us-west-2.elb.amazonaws.com
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 03 Jun 2015 00:20:13 GMT
Server: openresty/1.7.10.1
Content-Length: 570
Connection: keep-alive
{"healthy":false,"results":[{"name":"nodeCount","ok":true,"expected":3,"actual":3},{"name":"seedCount","ok":true,"expected":2,"actual":2},{"name":"allHealthy","ok":false,"expected":[true,true,true],"actual":[true,true]},{"name":"operatingModeNormal","ok":false,"expected":["NORMAL","NORMAL","NORMAL"],"actual":["NORMAL","NORMAL"]},{"name":"lastHealthCheckNewerThan","ok":false,"expected":[1433290513143,1433290513143,1433290513143],"actual":[1433290781614,1433290789801]},{"name":"nodesHaveServerTask","ok":false,"expected":[true,true,true],"actual":[true,false,true]}]}
GET /service/cassandra.test2/health/cluster/report HTTP/1.1
Host: benw-19-elasticloa-a0ax8we91ngl-695045825.us-west-2.elb.amazonaws.com
Accept: */*
Connection: keep-alive
User-Agent: HTTPie/0.9.0
Accept-Encoding: gzip, deflate
HTTP/1.1 200 OK
Content-Type: application/json
Date: Wed, 03 Jun 2015 00:20:13 GMT
Server: openresty/1.7.10.1
Content-Length: 592
Connection: keep-alive
{"healthy":true,"results":[{"name":"nodeCount","ok":true,"expected":3,"actual":3},{"name":"seedCount","ok":true,"expected":2,"actual":2},{"name":"allHealthy","ok":true,"expected":[true,true,true],"actual":[true,true,true]},{"name":"operatingModeNormal","ok":true,"expected":["NORMAL","NORMAL","NORMAL"],"actual":["NORMAL","NORMAL","NORMAL"]},{"name":"lastHealthCheckNewerThan","ok":true,"expected":[1433290513357,1433290513357,1433290513357],"actual":[1433290787673,1433290792903,1433290797275]},{"name":"nodesHaveServerTask","ok":true,"expected":[true,true,true],"actual":[true,true,true]}]}
When attempting to uninstall cassandra with:
dcos package uninstall cassandra
I am met with the following output:
Multiple instances of app [cassandra] are installed. Please specify the app id of the instance to uninstall or uninstall all. The app ids of the installed package instances are: [/cassandra/dcos, /cassandra/test1, /cassandra/test2].
Then when I run:
dcos package uninstall --app-id=/cassandra/test2 cassandra
The framework scheduler process in marathon is correctly stopped (the framework isn't shut down yet because we haven't had a new release of the cli with the functionality).
I definitely agree that we will need some sort of control in place to make sure that the cli can function well, but I don't know if the requirement to have a framework-name
property is the correct way to go. In the case of cassandra marathon app id, framework name, executor ID and task ID are all correlated with the cassandra cluster that has been configured so things work out well. Perhaps there are other frameworks that are less disciplined in how these things are approached and this is something we should look into building tooling/documentation to direct service implementers toward.
cc @jsancio @BenWhitehead