docker / for-win

Bug reports for Docker Desktop for Windows
https://www.docker.com/products/docker#/windows
1.86k stars 289 forks source link

Connection timeout for some outside connections on all Docker containers after some time #12671

Closed arnomoonens closed 2 years ago

arnomoonens commented 2 years ago

Actual behavior

Some outside connections timeout in any Docker container after some time.

Expected behavior

No (repeated) timeouts.

Information

Output of & "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check

[FAIL] DD0027: is there available disk space on the host? low free space on host [SKIP] DD0028: is there available VM disk space? [PASS] DD0031: does the Docker API work? [PASS] DD0004: is the Docker engine running? [PASS] DD0011: are the LinuxKit services running? [PASS] DD0016: is the LinuxKit VM running? [PASS] DD0001: is the application running? [PASS] DD0018: does the host support virtualization? [PASS] DD0002: does the bootloader have virtualization enabled? [PASS] DD0020: is the Hyper-V Windows Feature enabled? [PASS] DD0017: can a VM be started? [PASS] DD0015: are the binary symlinks installed? [PASS] DD0003: is the Docker CLI working? [PASS] DD0013: is the $PATH ok? [PASS] DD0005: is the user in the docker-users group? [PASS] DD0007: is the backend responding? [PASS] DD0014: are the backend processes running? [PASS] DD0008: is the native API responding? [PASS] DD0009: is the vpnkit API responding? [PASS] DD0010: is the Docker API proxy responding? [PASS] DD0006: is the Docker Desktop Service responding? [PASS] DD0012: is the VM networking working? [PASS] DD0032: do Docker networks overlap with host IPs? [SKIP] DD0030: is the image access management authorized? [PASS] DD0033: does the host have Internet access?

Steps to reproduce the behavior

I am running some containers using Docker Compose. However, after some time I am getting timeouts when trying to do requests to a specific url. When I try to use curl on the same url either in the shell of the container that had the timeout or in a separate container (using docker container run -it alpine), it also times out (even with --network=host). Just DNS lookup seems to work fine.

I am doing requests to 3 different websites, and which one that results in a timeout seems to be random. At the same time, on the host machine I can access the url just fine (also using curl). The timeout goes away either after waiting for some time (sometimes a few hours) or after restarting Docker.

Things that I tried but didn't help:

Could you help me diagnose/fix this issue?

j03wang commented 2 years ago

fwiw, I and a few other people are also experiencing this issue - I am able to reach certain hosts in the container, but other hosts timeout (even though they are accessible on the computer running Docker). This has happened on version 4.7 as well as 4.8.2 on the M1 Mac.

Any help here would be appreciated.

leenoix commented 2 years ago

Having same problem here. Using M1 Macbook pro, with docker 4.7.0. For some sites, curl command within the container intermittently doesn't respond (Container is ubuntu20.04). I have done tcpdump within the container and also outside the container (in host mac), and it seemed like TCP SYNACK is received at host side, but the container didn't receive the SYNACK. Weird thing is that this is randomly happening and not always reproducible.

levimatheri commented 2 years ago

Same issue on Windows Server Datacenter 2019.

levimatheri commented 2 years ago

@arnomoonens Does this fix here work for you? https://github.com/docker/for-win/issues/698#issuecomment-314902326

docker-robott commented 2 years ago

Issues go stale after 90 days of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale

arnomoonens commented 1 year ago

@arnomoonens Does this fix here work for you? #698 (comment)

Unfortunately, this did not work.

/remove-lifecycle stale

levimatheri commented 1 year ago

@arnomoonens So for us it was port exhaustion issue since we were using NAT mode in docker. We increased the port range and the issue was resolved.

arnomoonens commented 1 year ago

@levimatheri Thanks for the insight. How did you find out that that was the issue?

levimatheri commented 1 year ago

@arnomoonens We looked in Event Viewer and saw lots of WinNAT port allocation errors.

DanielAbdelNour commented 1 year ago

@levimatheri that's brilliant! How did you go about increasing the port range? We've been facing the same issue for months now with no resolution in sight.

levimatheri commented 1 year ago

@DanielAbdelNour You can use this cmdlet. https://learn.microsoft.com/en-us/powershell/module/nettcpip/set-netudpsetting?view=windowsserver2022-ps#example-1-modify-the-dynamic-port-range-for-udp

Also do SetTCPSetting for TCP ports.

gpsfl commented 1 year ago

@DanielAbdelNour You can use this cmdlet. https://learn.microsoft.com/en-us/powershell/module/nettcpip/set-netudpsetting?view=windowsserver2022-ps#example-1-modify-the-dynamic-port-range-for-udp

Also do SetTCPSetting for TCP ports.

How much do you use for port range?

levimatheri commented 1 year ago

@gpsfl initially we were at 16,384 then we doubled to 32,768 ports.

docker-robott commented 1 year ago

Closed issues are locked after 30 days of inactivity. This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

/lifecycle locked