docker / for-mac

Bug reports for Docker Desktop for Mac
https://www.docker.com/products/docker#/mac
2.44k stars 117 forks source link

Kubernetes Load balanced services are sometimes marked as "Pending" #4903

Open lnhrdt opened 4 years ago

lnhrdt commented 4 years ago

I am using the Docker Desktop k8s environment to apply and delete k8s resources over and over, as part of my project's test automation. One of these resources is a load balanced service.

Something like this:

---
apiVersion: v1
kind: Service
metadata:
  name: xyz
spec:
  type: LoadBalancer
  ports:
    - port: 9999
      targetPort: 80
  selector:
    myLabel: xyz

About half the time it deploys properly and gets assigned its status.loadBalancer.ingress as seen here:

$ kubectl get services

NAME       TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
xyz        LoadBalancer   10.110.14.56     localhost     9999:31560/TCP   45m

And the other half of the time it hangs here perpetually:

$ kubectl get services

NAME       TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
xyz        LoadBalancer   10.110.14.56     <pending>     9999:31560/TCP   45m

The strange thing is, either way it works – I'm able to access my service on http://localhost:9999. However the incorrect status causes issues for tools that depend on the status (in my case kapp).

Digging around for answers, I found that this bug (or a similar one) was supposedly fixed in 18.03.0-ce-rc1-mac54 2018-02-27 but I'm still experiencing the issue on the latest stable v2.3.0.4 (46911).

I've also found other posts that may be people experiencing the same issue:

fillipo commented 4 years ago

Same here

demisx commented 4 years ago

Having the same issue after upgrade to v2.4.0.0. The LoadBalancer used to map to localhost external IP. Now, it always shows up as pending:

$ kubectl get service
NAME                            TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
nginx-ingress-controller        LoadBalancer   10.97.193.207    <pending>     80:31487/TCP,443:30054/TCP   2m4s
...
demisx commented 4 years ago

Never mind. I had a zombie process holding up port 80. This is why DFD couldn't allocate localhost as external IP.

lnhrdt commented 4 years ago

I'm still consistently getting this issue on the latest stable version, v2.4.0.0 (48506).

Unlike @demisx, I don't have another process holding up the port. In my case, deleting and recreating the service eventually succeeds, but it's causing confusion for me and my team.

@fillipo did you learn anything when you encountered the issue?

lnhrdt commented 4 years ago

We ended up working around this issue by configuring our deployment tool to ignore the status of the LoadBalancer resource.

It might be a little bit niche, but thought I'd share my workaround just in case others using kapp run into the same issue and find this conversation.

---
apiVersion: v1
kind: Service
metadata:
  name: xyz
  annotations:
    kapp.k14s.io/disable-wait: ""
spec:
  type: LoadBalancer
  ports:
    - port: 9999
      targetPort: 80
  selector:
    myLabel: xyz

The underlying issue in Docker for Mac is definitely still there though and I hope it can be fixed some day.

davidrc312 commented 3 years ago

It worked for me with the following steps:

  1. Delete the service: kubectl delete svc "service name"
  2. Restart Docker-desktop
  3. Re-create the service: kubectl apply -f k8s/service.yml

image

LeviticusMB commented 3 years ago

This is happening to me all the time recently. 😢 It's a major pain.

The workaround suggested by @davidrc312 only works sometimes — most of the time, I have to reboot when I redeploy my pods (i.e. after every kubectl delete -f ... and kubectl apply -f ...).

That's insane. I'm on 3.1.0, by the way.

shakyShane commented 3 years ago

This is affecting me also now.

LeviticusMB commented 3 years ago

So now I'm on the latest Docker (3.3.1) and it's still happening. However, I've found two workarounds that works for us:

  1. Our Makefile would build all images and generate the K8S configs. It would then kubectl delete the full config and kubectl apply it. This always removed the services/loadbalancers on each restart, which triggers the bug. So instead, now we just kubectl apply the new config and kubectl rollout restart deployment/XXX the affected deployments, keeping the loadbalancers running.
  2. When we do have to recreate the loadbalancers and they fail. Reset Kubernetes cluster in Dockers Troubleshoot panel always recovers the situation, without the need for reboots or Docker reinstallations.
jeremychone commented 3 years ago

Thanks @LeviticusMB for the Troubleshoot > Reset Kubernetes tip. I have Docker 3.3.3, and my service would not bind to localhost, even after restart. The reset kubernetes did it, had to wait a minute or two. That was scary!

programmer04 commented 3 years ago

I had the same issue and spotted that logs for vpnkit (which is responsible for setting forwarding on localhost)

kubectl logs --follow vpnkit-controller -n kube-system         

shows

time="2021-06-24T11:44:12Z" level=error msg="Port 30000 for service postgres-service is already opened by another service"
time="2021-06-24T11:44:12Z" level=info msg="Opened port 30080 for service web-service:80"
time="2021-06-24T11:44:12Z" level=error msg="Port 30080 for service web-service is already opened by another service"
time="2021-06-24T11:44:23Z" level=info msg="Opened port 30002 for service prometheus-service:8080"
time="2021-06-24T11:44:23Z" level=error msg="Port 30002 for service prometheus-service is already opened by another service"
time="2021-06-24T11:44:23Z" level=info msg="Opened port 30001 for service local-mon-service:8080"
time="2021-06-24T11:44:23Z" level=error msg="Port 30001 for service local-mon-service is already opened by another service"

both for LoadBalancer and NodePort services. I tried restarting Docker, resetting Kubernetes cluster, etc. it didn't help.

But I saw that even when Docker desktop is disabled below process still run

root 82217 0.0  0.0  5004180 2684 ??  Ss    1:55PM   0:00.02 /Library/PrivilegedHelperTools/com.docker.vmnetd

I killed it and enabled Docker again. Next, I redeployed services. Everything started working as designed.

bender-joe commented 3 years ago

I am also having this issue on windows v3.5.1. My only workaround is Quit Docker Desktop, followed by Start. Restart does not resolve the problem.

adrianiskandar commented 3 years ago

I'm also having this issue on Docker for mac: Kubernetes v1.21.2 Docker Engine v20.10.7 Trying to use a simple helloworld app:

apiVersion: v1
kind: Service
metadata:
  name: helloworld
  labels:
    app: helloworld
    visualize: "true"
spec:
  ports:
    - port: 8080
      targetPort: 8080
  selector:
    app: helloworld
  type: LoadBalancer
❯ kubectl get services
NAME         TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
helloworld   LoadBalancer   10.111.66.115   <pending>     8080:32469/TCP   117m
kubernetes   ClusterIP      10.96.0.1       <none>        443/TCP          5d18h
guenoledc commented 3 years ago

Hi, felt on the same problem and figured out that there was a docker process listening on my service port that seemed to prevent the service to expose it address.

On macos:

lsof -nP -iTCP -sTCP:LISTEN | grep 8080

gave

com.docke 45730 guenole   76u  IPv6 0x30100fd96bc5c4c7      0t0  TCP *:8080 (LISTEN)

Then I stopped docker desktop completely, then killed -9 the process.
Then I started docker desktop and redeploy my service and it worked.

For info:
Client Version: v1.21.4 Server Version: v1.21.4

jamiegilmartin commented 2 years ago

On Docker Desktop Version 4.3.0 (71786) : Settings > Kubernetes > Reset Kubernetes Cluster worked for me.

queil commented 2 years ago

Same here. Docker Desktop 4.4.4 on Windows. This is so frustrating. I've read this article, looked into vpnkit-controller's logs and then found this issue. It saved my sanity 😁However, I am wondering what a reliable solution would be...

queil commented 2 years ago

Ok, I think my colleague and I may have found a reason for this behaviour. We could replicate the issue with the following workflow:

So our hypothesis here is that when "Reset Kubernetes cluster" is pressed Docker Desktop misses the fact the ports should be released (as the service using it no longer exists)

jscheel commented 2 years ago

Ok, I think my colleague and I may have found a reason for this behaviour. We could replicate the issue with the following workflow:

  • apply your manifest, and wait until your service is reachable via the load balancer IP
  • hit the "Reset Kubernetes cluster" button
  • observe that the ports are still blocked by the docker backend process (despite the LB service deleted)
  • wait for the cluster to come up again and apply your manifest, and wait for the service to obtain the LB IP
  • observe you cannot access your service via the load balancer IP anymore
  • to fix it - quit Docker Desktop and run it again (DO NOT use Restart Docker)
  • your service should be accessible again

So our hypothesis here is that when "Reset Kubernetes cluster" is pressed Docker Desktop misses the fact the ports should be released (as the service using it no longer exists)

I think @queil is on to something here. I was experiencing the exact same thing recently, same steps and all.

AndreLouisCaron commented 2 years ago

Thanks @queil for the solution! I've been experiencing this issue intermittently for months now and this finally gave me a sure-fire way to resolve the issue.,

kris-lwks commented 2 years ago

I confirm that Reset Kubernetes cluster and fresh kubectl apply -f ... worked for me.

One observation I had was that, previously EXTERNAL-IP was pending and now appears as localhost and working.

NAME          TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
app-service   LoadBalancer   10.110.147.247   localhost     8124:32123/TCP   6m4s
kubernetes    ClusterIP      10.96.0.1        <none>        443/TCP          9m31s
pdevine commented 2 years ago

@queil 's method worked for me. Quit Docker Desktop and then restart it from the desktop (in macOS). Restarting it from the dropdown did not work.

rlewkowicz commented 2 years ago

I prefer this method:

sudo pkill -9 -f docker; sudo pkill -9 -f Docker

Quick, dirty, gets the job. Pop it and restart the app.

docker-robott commented 2 years ago

Issues go stale after 90 days of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale

venkateshakarle commented 2 years ago

I too faced this issue very often though using latest version. but was able to resolve it by rest and restart of Docker for Desktop.

thanks for suggestions.

omninonsense commented 2 years ago

/remove-lifecycle stale

omninonsense commented 2 years ago

I've had this issue happen only after enabling the new virtualisation framework in experimental features; it wasn't happening before that. I wonder if those could be related?

pdevine commented 2 years ago

I hadn't hit this for a while, but recently had it happen again. I resolved it similarly to @venkateshakarle by restarting Docker Desktop (instead of having to reset the entire cluster).

docker-robott commented 1 year ago

There hasn't been any activity on this issue for a long time. If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment. If not, this issue will be closed in 30 days.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

DMaxter commented 1 year ago

I had this issue after reseting my Kubernetes cluster

DMaxter commented 1 year ago

/remove-lifecycle stale

haf commented 1 year ago

/remove-lifecycle stale

rajjaiswalsaumya commented 1 year ago

this is actually not sometimes but always. Kafka is not getting exposed on external ip at all. I also tried with metallb installed on Kubernetes of docker desktop, same issue. It works with Minikube on Mac M1 though.

anthonyalayo commented 1 year ago

This chronically happens to me when tearing down and starting up my cluster. I use docker desktop to test out changes, and the chronic port hogging is very annoying. I have no choice but to kill docker desktop.

This issue has been open for almost 3 years. Can we get someone from Docker to chime in?

rajjaiswalsaumya commented 1 year ago

High time to switch away from docker-desktop and use Kubernetes directly. On a sidenote, i liked idea of minikube which asks for user password and gets system privileges to create a proxy to Kubernetes cluster. Even Kind is not working correctly due to security on Mac.

anthonyalayo commented 1 year ago

@rajjaiswalsaumya Docker's VPN kit is very good, and essential on corporate laptops. I think we just need the Docker team to take a look at this one.

haf commented 1 year ago

We've had a lot of success with k3d on MacOS; after trying this, trying minikube, trying kind - k3d was the charm!

anthonyalayo commented 1 year ago

Ping - anyone from Docker team available?

arkodg commented 1 year ago

issue still exists. Here's my inner loop workflow

  1. Create K8s Service (of type Loadbalancer)
  2. Test locally using some client (like curl) by sending traffic to Service.Status.Loadbalancer.Ingress[0].IP
  3. Recreate the cluster using the Reset Kubernetes Cluster UI button in Docker Desktop
  4. Repeat, this time, step 2 doesnt work
gl-remote commented 10 months ago

flask use port 5000, but it was used by MAC. when i use another port , everything is work.

% lsof -nP -iTCP -sTCP:LISTEN
COMMAND     PID   USER   FD   TYPE             DEVICE SIZE/OFF NODE NAME
ControlCe   420 gaolei    6u  IPv4 0x6dc0b62cfe479c67      0t0  TCP *:7000 (LISTEN)
ControlCe   420 gaolei    7u  IPv6 0x6dc0b62cfec00bdf      0t0  TCP *:7000 (LISTEN)
ControlCe   420 gaolei    8u  IPv4 0x6dc0b62cfe47a7ef      0t0  TCP *:5000 (LISTEN)
ControlCe   420 gaolei    9u  IPv6 0x6dc0b62cfec003df      0t0  TCP *:5000 (LISTEN)
apiVersion: v1
kind: Service
metadata:
  name: control-persistent-service
spec:
  type: LoadBalancer
  selector:
    app: control-persistent-api
  ports:
    - name: api
      protocol: TCP
      port: 5656    # change 5000 to 5656
      targetPort: 5656