Updates with kubemci gets stuck

ivanmp91 commented 5 years ago

We're running two k8s clusters in different regions with a global load balancer and multi cluster ingress configured with kubemci.

I'm facing with an issue updating the ingress with kubemci, which sometimes the process get stuck with an infinite loop showing the following message:

$ kubemci create entrypoint-global --kubeconfig=$HOME/.kube/mcikubeconfig --ingress=./ingress.yaml --namespace=entrypoint-v1 --gcp-project=projectx --force

Found existing ingress resource which differs from the proposed one
Updating existing ingress resource to match the desired state (since --force specified)
Updated ingress for cluster: gke_projectx_us-west1-a_kube-us-west
Found existing ingress resource which differs from the proposed one
Updating existing ingress resource to match the desired state (since --force specified)
Updated ingress for cluster: gke_projectx_europe-west1-b_kube-eu-west
Ensuring health checks
Path for healthcheck is /
Ensuring health check for port: {SvcName:entrypoint-v1/entrypoint-ng SvcPort:{Type:0 IntVal:80 StrVal:} NodePort:31064 Protocol:HTTP SvcTargetPort: NEGEnabled:false}
Health check mci1-hc-31064--entrypoint-global exists already. Checking if it matches our desired health check
Desired health check exists already
Determining instance groups for cluster gke_projectx_us-west1-a_kube-us-west
Waiting for ingress ( entrypoint-v1 : entrypoint-global ) to get ingress.gcp.kubernetes.io/instance-groups annotation.....
Waiting for ingress ( entrypoint-v1 : entrypoint-global ) to get ingress.gcp.kubernetes.io/instance-groups annotation.....
Waiting for ingress ( entrypoint-v1 : entrypoint-global ) to get ingress.gcp.kubernetes.io/instance-groups annotation.....
Waiting for ingress ( entrypoint-v1 : entrypoint-global ) to get ingress.gcp.kubernetes.io/instance-groups annotation.....

I've found this is a problem with the annotation key "ingress.gcp.kubernetes.io/instance-groups" defined on multi cluster ingress. For some reason in one of the clusters it's well defined:

kubectl get ing entrypoint-global -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    ingress.gcp.kubernetes.io/instance-groups: '[{"Name":"k8s-ig--d182d973c6a5a45c","Zone":"https://www.googleapis.com/compute/v1/projects/kubertonic/zones/us-west1-a"},{"Name":"k8s-ig--d182d973c6a5a45c","Zone":"https://www.googleapis.com/compute/v1/projects/kubertonic/zones/us-west1-b"},{"Name":"k8s-ig--d182d973c6a5a45c","Zone":"https://www.googleapis.com/compute/v1/projects/kubertonic/zones/us-west1-c"}]'

but in the other one it's not:

kubectl get ing entrypoint-global -o yaml
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: gce-multi-cluster
    kubernetes.io/ingress.global-static-ip-name: entrypoint-global
  creationTimestamp: 2018-08-28T13:00:25Z

After some retries it seems the annotation ingress.gcp.kubernetes.io/instance-groups it's populated correctly and it allows to apply the change correctly.

Apparently this annotation is only overwritten only by the controller according to the comment left in the code:

https://github.com/kubernetes/ingress-gce/blob/master/pkg/annotations/ingress.go#L66

Any ideas of why sometimes this annotation disappears and takes too long to update it? Maybe this is an underlying issue calling google API to get the cluster zones to populate this information, but this is only a conjecture based on the behaviour observed.

Thanks!!

nikhiljindal commented 5 years ago

kubemci CLI relies on the ingress-gce controller running in-cluster to populate that annotation. It can take a while for the in-cluster controller to observe the new ingress API resource, create a GCP Instance Group and add that annotation back to the ingress API resource. The CLI needs to wait for the instance group to exist before it can create BackendService. Hence it waits for the annotation. The wait should approximately be the same as the wait for single cluster ingress since it is the same ingress-gce controller in both cases. Are you seeing different wait times?

Any ideas of why sometimes this annotation disappears and takes too long to update it?

Strange the annotation should not disappear once it has been added. Can you explain a bit more when do you see this?

Thanks for trying out kubemci.

ivanmp91 commented 5 years ago

Hello @nikhiljindal !

Thanks for your response.

The wait should approximately be the same as the wait for single cluster ingress since it is the same ingress-gce controller in both cases. Are you seeing different wait times?

In this specific scenario when I execute the update cmd with kubemci (I tried it 3 or 4 times) it may takes around 10 minutes to finish. Meanwhile the update is in progress the message Waiting for ingress... is showed.

According with that you said about in-cluster controller, the weird thing is the instance groups already exists. In this particular situation I'm trying to create a new multi cluster ingress, which is a copy of an existing one with a different name and pointing to the same backend service. Under my understand this action should be done almost in the fly.

Strange the annotation should not disappear once it has been added. Can you explain a bit more when do you see this?

I only see that issue when I try to execute the update action that I mentioned before with kubemci. Right now I can see the annotations in both k8s clusters are consistents, which drives me to think this only happens when you perform any kind of update...

When the waiting message appears I can see the annotations for the ingress in US cluster are fine, but in the case of the EU cluster disappears, and surprisingly it happens all times in the same cluster.

I hope this information may help. Let me know if I can help in anything else!

Thanks!!

nikhiljindal commented 5 years ago

According with that you said about in-cluster controller, the weird thing is the instance groups already exists. In this particular situation I'm trying to create a new multi cluster ingress, which is a copy of an existing one with a different name and pointing to the same backend service. Under my understand this action should be done almost in the fly.

Yes. if instance group already exists, wait time should not be too long. The CLI still needs to create an in-cluster ingress, the in-cluster ingress controller watches it and adds the annotation and then CLI observes the annotation to proceed. To debug, You can list the in-cluster ingress yourself and see if it has the annotation.

Also cc @G-Harmon

guanzo commented 4 years ago

I successfully created a mci with the kubemci create ... cmd, but when I updated it with the --force flag, I waited for 20 minutes and all I got was this line repeated

Waiting for ingress ( default : gcdn ) to get ingress.gcp.kubernetes.io/instance-groups annotation.....

I tried the delete, recreate method mentioned in the kubemci docs, and i'm getting the same repeated message. What can I do to fix this?

EDIT: Okay the problem was I tried to add a default backend that had a service type ClusterIP. Seems like only services of type NodePort can be specified in the mci?

nikhiljindal commented 4 years ago

Right, the service type needs to be type NodePort and you need to use the same node port across all clusters. Glad to hear you got this working.

Whenever you are ready, would love to get your feedback on how you think kubemci is: https://github.com/GoogleCloudPlatform/k8s-multicluster-ingress/issues/117.

nikhiljindal commented 4 years ago

Closing this issue since the issues seem to be resolved. Please feel free to reopen

lukeribchester commented 4 years ago

Hi @nikhiljindal

I'm encountering the same issue. Do you have any new insight into the cause of the 'instance-groups' annotation not being set?

Input

./kubemci create app-mci \
    --ingress=ingress.yaml \
    --gcp-project=app-prod \
    --kubeconfig=mcikubeconfig

Output

% ./kubemci create app-mci --ingress=ingress.yaml --gcp-project=app-prod --kubeconfig=clusters.yaml        
Created Ingress in cluster: gke_app-prod_europe-west4-a_app-europe-west4
Created Ingress in cluster: gke_app-prod_us-east4-a_app-us-east4
Ensuring health checks
Pod app-deployment-c99578769-xdmql matching service selectors app=app (targetport ): lacks a matching HTTP probe for use in health checks.
Pod app-deployment-c99578769-xgq2m matching service selectors app=app (targetport ): lacks a matching HTTP probe for use in health checks.
Pod app-deployment-c99578769-qms7r matching service selectors app=app (targetport ): lacks a matching HTTP probe for use in health checks.
Pod app-deployment-c99578769-tsrsw matching service selectors app=app (targetport ): lacks a matching HTTP probe for use in health checks.
Path for healthcheck is /
Ensuring health check for port: {SvcName:default/app-service SvcPort:{Type:0 IntVal:80 StrVal:} NodePort:30061 Protocol:HTTP SvcTargetPort: NEGEnabled:false}
Health check mci1-hc-30061--app-mci exists already. Checking if it matches our desired health check
Desired health check exists already
Determining instance groups for cluster gke_app-prod_europe-west4-a_app-europe-west4
Waiting for ingress ( default : app-ingress ) to get ingress.gcp.kubernetes.io/instance-groups annotation.....
Waiting for ingress ( default : app-ingress ) to get ingress.gcp.kubernetes.io/instance-groups annotation.....
Waiting for ingress ( default : app-ingress ) to get ingress.gcp.kubernetes.io/instance-groups annotation.....
⋮

As you can see my configuration is identical (aside from resource names) to that in the multi-cluster ingress guide:

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment
spec:
  selector:
    matchLabels:
      app: app
  replicas: 2
  template:
    metadata:
      labels:
        app: app

service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    app: app
  name: app-service
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 8080
      name: http
      nodePort: 30061
  selector:
    app: app
  type: NodePort

ingress.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    kubernetes.io/ingress.global-static-ip-name: app-ip
    kubernetes.io/ingress.class: gce-multi-cluster
spec:
  backend:
    serviceName: app-service
    servicePort: 80

This is obviously a massive impediment so your help would be greatly appreciated.

Thank you!

lukeribchester commented 4 years ago

Solved.

HTTP load balancing was disabled in my clusters, therefore the load balancer controller wasn't setting the annotation.

Enable HTTP load balancing:

% gcloud container clusters update [CLUSTER_NAME] --update-addons HttpLoadBalancing=ENABLED

Updating ***...done.                                                                                                                                                              
Updated [https://container.googleapis.com/v1/projects/***/zones/us-east4-a/clusters/***].
To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-east4-a/***?project=***

I created this StackOverflow post for additional documentation.

GoogleCloudPlatform / k8s-multicluster-ingress

Updates with kubemci gets stuck #204

Solved.