digitalocean / digitalocean-cloud-controller-manager

Kubernetes cloud-controller-manager for DigitalOcean (beta)
Apache License 2.0
529 stars 150 forks source link

Load Balancer creation fails randomly #167

Closed laander closed 5 years ago

laander commented 5 years ago

This is basically a repost of https://github.com/digitalocean/digitalocean-cloud-controller-manager/issues/103

In short, when creating a new Service of type LoadBalancer, it failed on first try. Here's the event output:

  Type     Reason                      Age               From                Message
  ----     ------                      ----              ----                -------
  Warning  CreatingLoadBalancerFailed  12m               service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service default/load-balancer: Get https://api.digitalocean.com/v2/load_balancers/545bfec9-6c3c-4da1-bfac-dcfaa1a7396e: context deadline exceeded
  Normal   EnsuringLoadBalancer        3m (x6 over 14m)  service-controller  Ensuring load balancer
  Warning  CreatingLoadBalancerFailed  2m (x5 over 10m)  service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service default/load-balancer: error waiting for load balancer to be active Get https://api.digitalocean.com/v2/load_balancers/545bfec9-6c3c-4da1-bfac-dcfaa1a7396e: context deadline exceeded

The service manifest is fairly straight-forward:

apiVersion: v1
kind: Service
metadata:
  name: load-balancer
  labels:
    app: my-app
spec:
  type: LoadBalancer
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 80
  selector:
    app: my-app

Upon second try, it did eventually succeed, but had to retry once:

  Type     Reason                      Age              From                Message
  ----     ------                      ----             ----                -------
  Warning  CreatingLoadBalancerFailed  2m               service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service default/load-balancer: Get https://api.digitalocean.com/v2/load_balancers/ef24cc3c-456e-4bbb-8ec3-77352222cb3d: context deadline exceeded
  Normal   EnsuringLoadBalancer        1m (x2 over 3m)  service-controller  Ensuring load balancer
  Normal   EnsuredLoadBalancer         22s              service-controller  Ensured load balancer

The error message context deadline exceeded seems be Go's way of responding with a timeout. Is this due to some internal firewall rules that are applied in a race condition fashion?

EDIT: The Kubernetes cluster version is 1.13.1-do.2 running in the LON1 zone

timoreimann commented 5 years ago

Hey there @laander. I just tried to reproduce your problem by spawning a new cluster in LON1 and setting up a Service-managed LB to an Nginx pod. Tried several times and always got an LB created in ~1 minute with no errors shown in the event logs.

Are you still experiencing the issue described? If yes and you are using DigitalOcean's managed Kubernetes offering (DOKS), I'd ask you to file a support ticket; I can then take a closer look at your cluster.

laander commented 5 years ago

Hi @timoreimann, I haven't tried it for a while, so let's put it on ice for now and I'll reopen the ticket if it persists. For the record, I'm using the new DOKS solution in Limited Availability.