Open micchickenburger opened 5 years ago
When the issue occurs, this is the error I receiving trying to run mongo on my host to connect with the monger server in my mongo docker container.
$ mongo
MongoDB shell version v4.0.2
connecting to: mongodb://127.0.0.1:27017
2019-01-08T12:18:13.494-0600 E QUERY [js] Error: network error while attempting to run command 'isMaster' on host '127.0.0.1:27017' :
connect@src/mongo/shell/mongo.js:257:13
@(connect):1:6
exception: connect failed
This is the issue I experienced trying to create an SQS queue on localstack:localstack
running in a container:
Starting Localstack...
Started Localstack with hash 8a42a0e43174653894b4e1985d7d835f48b8da6ab46e7ecef57f66edaff4176d
Creating queue jobs at SQS endpoint http://localhost:4576/queue/jobs
An error occurred (502) when calling the CreateQueue operation (reached max retries: 4): Bad Gateway
Image versions:
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
localstack/localstack latest c583aaf39486 4 days ago 1.02GB
mongo latest 7177e01e8c01 10 days ago 393MB
I am also experiencing this issue. Same exact Docker for Mac version. The docker compose containers will start up fine after a system reboot or docker restart, but the network mapping to the host will stop responding after a (seemingly) random amount of time. I can still bash into the containers, and they access each other through curl, so the docker sub-network seems to be fine, but there is no system host mapping. I get timeouts trying to access them from a browser.
I've allocated 16gb to docker and 6 cpus, so I don't think it's a resource issue. The stack uses roughly 6gb of memory.
I too had this problem and downgrading to Docker Community Edition 18.06.0 seemed to have fixed it. It might be related to https://github.com/docker/for-mac/issues/3360
Same problem here on all versions higher than 18.06.1-ce-mac73 (26764). For me, the network connections drop out when using the JVM debugger after about 60 seconds.
Same Problem here: Diagnostics ID: 2D80AB9C-9218-4705-933D-0B9B7525F15A/20190104105334 Might be related to #3417
Same problem here
I experience this too...
I experience this as well. Same version.
docker version Client: Docker Engine - Community Version: 18.09.0 API version: 1.39 Go version: go1.10.4 Git commit: 4d60db4 Built: Wed Nov 7 00:47:43 2018 OS/Arch: darwin/amd64 Experimental: false
Server: Docker Engine - Community Engine: Version: 18.09.0 API version: 1.39 (minimum version 1.12) Go version: go1.10.4 Git commit: 4d60db4 Built: Wed Nov 7 00:55:00 2018 OS/Arch: linux/amd64 Experimental: true
I have noticed that when it happens the CPU spikes. I have not figured out if the spike is a cause or a symptom.
It defiantly seems load based. I was able to make it a day and overnight without network loss. I was only running one small web service. First time in a week I have been able to run longer than an hour or two.
@jeff-cook I'm not certain it is load related. I experienced similar symptoms with UDP packets failing to be bridged into the container. In each of the several cases the container ran for 24 hours or so and when returning to the office in the morning the networking to the container was dead. During the day the host would have constant use with periods of the CPU being driven hard. So, I suspect it is not load, but some other condition that triggers the bug.
My environment is a mac server with a static IP assigned, so it is not related to sleep or flaky wifi connections. Reverting to 18.06.1 seems to be a valid workaround, so far.
It happened (76297123-260B-45B6-872E-9DE74FB5F950/20190111200619) even after a downgrade to Version 2.0.0.0-mac78 (28905) c404a62c3f
This issue occurred again, after 2 days of uptime, running Server Version: 18.06.1-ce
.
I've determined that ingress network connections succeed (ie. I see UDP network packets arrive in the container). However, outbound connections from the container fail (ie. ping google.com
doesn't receive any responses). Restarting the engine resolved the outbound connection issue.
Same here. As @josh-h already mentioned, I don't think it is load related.
macOS Mojave Version 10.14
Same problem here!!But ,after 30min i can get this error when I restart docker service every time !
me, too. I am using RabbitMQ, Postgres in Django App dev. After publishing or receiving some data, the network will be failed to my localhost, I cannot connect to rabbitmq and postgres from host. But I can still docker-compose exec postgres bash
to enter the container to test, in the container, I can connect the db and mq.
Besides, I cannot ping the Postgres and RabbitMQ successfully.
After restart docker for mac, all recover. but a few moment later, it came to me again and again.....
I have to stop my coding to restart the docker, what a mess...
Same problem here. Does anyone have a solution already?
Same problem here. Does anyone have a solution already?
https://github.com/docker/for-mac/issues/3448#issuecomment-452490002
Same problem here
@jezao Downgrading to 18.06.0-ce-mac70 2018-07-25 solved my problem. Download here: https://download.docker.com/mac/stable/26399/Docker.dmg
Same problem.
macOS 10.14.4 Docker Desktop 2.0.0.3(31259)
docker engine
Client: Docker Engine - Community
Version: 18.09.2
API version: 1.39
Go version: go1.10.8
Git commit: 6247962
Built: Sun Feb 10 04:12:39 2019
OS/Arch: darwin/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.2
API version: 1.39 (minimum version 1.12)
Go version: go1.10.6
Git commit: 6247962
Built: Sun Feb 10 04:13:06 2019
OS/Arch: linux/amd64
Experimental: false
docker-compose
docker-compose version 1.23.2, build 1110ad01
docker-py version: 3.6.0
CPython version: 3.6.6
OpenSSL version: OpenSSL 1.1.0h 27 Mar 2018
@daviyang35 Did you try to downgrade to 18.06.0-ce-mac70 2018-07-25? See my comment above.
@MarounMaroun Yes. Use your link can ignore this issues. Thanks.
I'm experiencing the same issue I guess. (Engine version 18.09.2)
Normally, it happens after a week of running like 10 containers. The load is always extremely light. (I have no numbers) What I have noticed yesterday is that network traffic higher than usual (~200 transaction/second) results in losing the network connection to containers from the host. By one transaction I mean a set of connect, send, receive, disconnect operations. Doing the exact same operations at 1 transaction/sec did not trigger the issue.
I remember seeing a log entry somewhere produced by docker about syn flood which could be related. Unfortunately I do not remember neither can figure out where I saw it.
I have noticed that when I experience the issue the networking between the containers still work fine.
Same issue. Seems to occur randomly. Restarting Docker fixes it.
I had something similar (#3674) recently, and I ended up writing a test to reliably catch the issue, which I published at https://github.com/dmuth/docker-health-check
In my case, downgrading to 18.06.0-ce-mac70 also worked.
-- Doug
still an issue with the most recent release, downgrading seems to alleviate.
Issue exists even on 2.0.0.3. Had to downgrade to 18.06.0-ce-mac70.
Same here, i keep getting network issues. Containers are running fine, i can connect to them via bash, but they won't load on my browser or connect to each other.
I've been banging my head against the wall with this issue. I'm running a Django app in a docker container, which connects to a Postgres database on the host machine. I am getting the following error quite often (every 2 - 10 minutes):
django.db.utils.OperationalError: could not connect to server: Connection timed out
I've downgraded from Docker 2.1.0.3 (currently the latest) to 18.06.0-ce-mac70 as suggested above, and the error above disappeared, but only to be replaced by this error, which occurs even more often:
django.db.utils.OperationalError: could not connect to server: Connection refused
This seems to be related to load / number of requests. The issue occurs more often in high load situations.
Any suggestions on what I can do next?
Problem is still reproducible on 2.1.0.5
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
comment.
Stale issues will be closed after an additional 30d of inactivity.
Prevent issues from auto-closing with an /lifecycle frozen
comment.
If this issue is safe to close now please do so.
Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale
/remove-lifecycle stale
@guillaumerose could you please help with this issue? Basically, docker for mac can't be reliably used for development in some cases, the containers just stop responding on network connections.
E.g. right now I have the following symptoms for elasticsearch container listening on port 9200:
$ curl -vvv http://127.0.0.1:9200/zzz
<fails when running from Mac>
curl: (7) Failed to connect to 127.0.0.1 port 9200: Operation timed out
$ netstat -nav | grep 9200
<shows 128 CLOSE_WAIT connections which are stale forever>
tcp4 271 0 127.0.0.1.9200 127.0.0.1.59438 CLOSE_WAIT 408300 146988 10666 0 0x1123 0x00000024
tcp4 271 0 127.0.0.1.9200 127.0.0.1.59437 CLOSE_WAIT 408300 146988 10666 0 0x1123 0x00000024
tcp4 271 0 127.0.0.1.9200 127.0.0.1.59436 CLOSE_WAIT 408300 146988 10666 0 0x1123 0x00000024
tcp4 271 0 127.0.0.1.9200 127.0.0.1.59435 CLOSE_WAIT 408300 146988 10666 0 0x1123 0x00000024
tcp4 0 0 127.0.0.1.9200 *.* LISTEN 131072 131072 10666 0 0x0100 0x00000026
# ps aux | grep 10666
dko 10666 0.0 0.4 5064504 143208 ?? S Tue02PM 11:27.27 /Applications/Docker.app/Contents/MacOS/com.docker.backend -watchdog
$ pgrep node
<shows nothing, i.e. the connecting processes died long time ago>
$ sudo tcpdump -i any -n -p tcp port 9200 & curl -vvv http://127.0.0.1:9200/zzz
<only SYN packets travel?>
02:09:59.303560 IP 127.0.0.1.63681 > 127.0.0.1.9200: Flags [S], seq 1329676029, win 65535, options [mss 16344,nop,wscale 6,nop,nop,TS val 664304244 ecr 0,sackOK,eol], length 0
...
02:10:02.815838 IP 127.0.0.1.63681 > 127.0.0.1.9200: Flags [S], seq 1329676029, win 65535, options [mss 16344,nop,wscale 6,nop,nop,TS val 664307746 ecr 0,sackOK,eol], length 0
02:10:06.022543 IP 127.0.0.1.63681 > 127.0.0.1.9200: Flags [S], seq 1329676029, win 65535, options [mss 16344,sackOK,eol], length 0
...
02:10:18.867903 IP 127.0.0.1.63681 > 127.0.0.1.9200: Flags [S], seq 1329676029, win 65535, options [mss 16344,sackOK,eol], length 0
$ docker-compose exec elasticsearch bash
# curl http://127.0.0.1:9200/
<succeeds from inside the container>
{
"name" : "41d967fa27a4",
"cluster_name" : "docker-cluster",
Has this issue been officially addressed or acknowledged at all? I’ve personally had multiple colleagues experience this same issue, and it seems to appear more with relatively complicated/large stacks.
It’s also complicating my efforts to convert developers into Docker uses when they experience issues like this that shake their confidence in the technology.
The fact that there hasn’t been any official acknowledgement of this problem strange. I think any number of devs (myself included) would be more than happy to help diagnose the root cause if called upon, but that hasn’t happened in more than a year?
Is Docker still being developed as a product? Or has is secretly gone into maintenance mode?
Note that downgrading should probably still be a viable workaround.
Just anecdotal... But I suffered with this issue for a while, trying a new version every few months and then downgrading back to version i knew worked. Eventually somebody suggested it might be a memory issue, so i upgraded again and gave Docker a heap more memory (currently allocated 12gb) and the problem went away. It's been solid for a few months now.
Adding resources helped for me up to a point, but wasn’t a panacea.
Even if the “fix” is just surfacing an error message to the user, this is better than a silent failure leading to hours of app troubleshooting, just to find out it’s a failure with docker.
This issue is still a big pain. My ElasticSearch container randomly spits out the error about a GET call failed and the entire docker crashes and needs to be restarted. This makes development really difficult.
BTW I mitigated it a little by just opening much less connections to ES and making those connections persistent.
It looks like there is some memory (or connections?) leak in docker port forwarding proxy, so if you open and close a lot of concurrent connections from host to a container and do it many times per second, eventually these connections leak (even when the process which opens connections is terminated - so it's really a bug in docker-desktop, not in the calling app). I posted netstat/tcpdump above.
Before, we were accidentally opening one new connection per ES query and then closing them at random moments of time (sometimes keeping them open for a long time). After we switched to persistent connections, the docker bug disappeared too.
This issue is open for over a year without any viable workaround (installing a 2018 version doesn't work, at least for me).
Any other suggestions?
Same problem.
Issues go stale after 90 days of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
comment.
Stale issues will be closed after an additional 30 days of inactivity.
Prevent issues from auto-closing with an /lifecycle frozen
comment.
If this issue is safe to close now please do so.
Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale
/remove-lifecycle stale
I have the same problem. At least I'm not alone, I guess.
I'm having this issue too. Are there any official statements regarding this issue? I'm using v 3.1.0 @ MacOs Catalina 10.15.7 I can't ping anything on my network from within the container. I can reach other services within the container. I'm also able to connect to the container from the outside using the exposed ports.
Confirming the same issue. It's only started relatively recently, I think after I added a new container to a compose file but I'm not certain as the timing doesn't line up perfectly. I've just thrown a load more resources at docker to see if it helps, but it had plenty already. Docker 3.1.0 on MacOS 10.15.7.
I can access the container's web server through a HTTP connection, but the containers can't communicate with each other and can't talk to the outside world (pinging an external site from within the container doesn't work). It works fine for about two days at a time before entering this state with no config changes. restarting containers doesn't help, only restarting docker or the machine docker's running on.
Edited to add that after 2 days with increased resources, it's lost networking again. So additional resources appears to have no impact.
same here, Docker 3.3.1 on Big Sur 11.2.3
Seeing the same issue with Docker 3.3.3 on Big Sur 11.3.1. I suspect @dko-slapdash is on the right track re: resource leak related to number or rate of connections; running an aggressive nmap scan from a container to the host results in almost immediate network failures that are only resolved by restarting Docker engine.
Pinging some maintainers here: @StefanScherer @djs55 @stephen-turner in case this issue flew under their radars.
As a summary of this thread, users have been experiencing intermittent network failures, connection resets, etc., requiring a full restart of Docker or pruning all images and networks for the thing to come back on-line. Seems to be related to a resource/memory leak, and happens more often under heavy load spikes and adding more resources to Docker seems to work for a little bit. There seem to be relatively easy steps to reproduce this problem at will, see e.g. https://github.com/docker/for-mac/issues/3448#issuecomment-628507263.
There are many other issues out there that seem to reference the same problem, e.g. #5538 #3674 #3360, all of these have since been closed or are stale since no responses have been received from the maintainers. These issues go as far back as 2008, since the new Docker Desktop for Mac was introduced.
We have been experiencing similar problems with Docker for Mac for the past 6 months and our team is often losing hours of productivity because of this issue. Would really appreciate it if one of you could look into this.
Expected behavior
Networking between my Mac OS host and running docker containers should not sporadically stop working.
Actual behavior
I'm experiencing network errors (connection resets, network errors) between my Mac host and all running docker containers. Restarting the containers does not resolve the problem. I have to restart docker to solve the problem. I've experienced this problem twice today, but had never experienced it before.
Information
Diagnostic logs
Steps to reproduce the behavior
It's hard to reproduce. Connectivity just ceases seemingly randomly, requiring me to restart docker. This is occurring on a fresh docker install as well.