Closed goodinfoconsulting closed 5 years ago
I'll do some test. I think the etcd instance restart might cause some issues.
Hi @simonferquel , any updates on this?
I did not have time yet to investigate the issue, I plan to do that early next week.
After following the installation guide exactly from: https://github.com/docker/compose-on-kubernetes/blob/master/docs/install-on-microk8s.md The same problem has been happening in Microk8s also.
I just had a look at it, and have rooted the issue:
Additionally, for production readyness, it is strongly recommended that you sue mutual TLS to connect to the ETCD, as described here: https://github.com/coreos/etcd-operator/blob/master/doc/user/cluster_tls.md
I will write a PR about ETCD-operator proper use.
We followed the blog here to install Compose-on-Kubernetes on a 1 Node Azure AKS Cluster https://github.com/docker/compose-on-kubernetes/blob/master/docs/install-on-aks.md
We've followed these instructions to the T, including ensuring that we install an etc-d cluster separate from the default etc-d instance that comes with K8.
Everything works great on first install.
As advertised, we are able to run
docker stack deploy
successfully, and deploy our containers and services using our compose YAML files.Problem
However, when we restart the AKS Node, the Compose and Compose API deployments fail to start with the following errors:
Compose
Compose API
The pods fail to start, with the following error:
Deleting the pods does not help. New Pods throw the same error.
Also, trying to run any docker stack command when the compose containers are in this state throws the following error:
$ docker stack ls --orchestrator=kubernetes
Deleting and re-installing the compose api using
installer-windows.exe -namespace=compose -etcd-servers=http://compose-etcd-client:2379 -tag=v0.4.18
gets the pods to start again, but the service remains broken -- throwing the previous errorthe server is currently unable to handle the request (get stacks.compose.docker.com)
when we run anydocker stack
command. This is despite all pods and deployment now being in a green state.In short. Restarting the AKS Node completely breaks the Compose API.
The only way we've found so far to restore the API is to completely delete the AKS cluster and create a new one. Not a tenable production solution.
Expected behavior:
Restarting AKS nodes should bring all components of Compose on Kubernetes back online, automatically, and Developers should be able to run
docker stack
as soon as the node is back online - without further interventions.