Open ricklamers opened 4 years ago
Do the docker daemon logs (preferably with the daemon in "debug" mode) and/or system logs show any information about things that happen?
Does your machine happen to have NetworkManager
(https://help.ubuntu.com/community/NetworkManager) installed? I know of situations where NetworkManager is quite "greedy" and attempts to try managing network interfaces that are created in other network-namespaces
Thanks for helping out here @thaJeztah.
I've attached the docker daemon log and /var/log/syslog that capture the moment when the connection is dropped in Firefox.
var-log-syslog.log docker-dameon-log.log
apt list network-manager
shows:
rick@rick-System-Product-Name:/var/log$ apt list network-manager
Listing... Done
network-manager/bionic-updates,now 1.10.6-2ubuntu1.4 amd64 [installed,automatic]
N: There are 2 additional versions. Please use the '-a' switch to see them.
I don't recall ever interacting with/installing network-manager so it's probably what you get by default when you install Ubuntu 18.04.4 LTS.
These errors stood out to me in the logs, but I have no clue what they mean:
Jun 10 15:04:03 rick-System-Product-Name libvirtd[1510]: 2020-06-10 13:04:03.553+0000: 1861: error : virFileReadAll:1420 : Failed to open file '/sys/class/net/veth5f8b8a8/operstate': No such file or directory
Jun 10 15:04:03 rick-System-Product-Name libvirtd[1510]: 2020-06-10 13:04:03.553+0000: 1861: error : virNetDevGetLinkInfo:2530 : unable to read: /sys/class/net/veth5f8b8a8/operstate: No such file or directory
What puzzles me is that Firefox has this issue while Chrome doesn't in otherwise the very same situation. Which seemed to point to me that there's some sort of network disturbance that Firefox's network stack decides is enough to drop the connection while Chrome seems to kind of "ignore" the disturbance.
I don't recall ever interacting with/installing network-manager so it's probably what you get by default when you install Ubuntu 18.04.4 LTS.
I suspect it may be installed by default on "desktop" installs, where it makes more sense because it handles (e.g.) switching (WiFi) networks, which would me more "common" on a Laptop than on a server.
These errors stood out to me in the logs, but I have no clue what they mean:
Yes, I've seen such errors in previous issues where NetworkManager was running; what I suspect happens there is that NetworkManager tries to act on every network interface on the machine; containers get their own virtual interface, so when a container is started, NetworkManager tries to take control of that interface, but because it's in the container's namespace, it then fails to find it.
It's possible that because it still detected that interface, it's reconfiguring other interfaces (not sure), which could explain the networking issue.
It's worth trying if (temporarily) disabling NetworkManager solves the issue (I'm not on a Linux machine with NetworkManager installed, but sudo systemctl stop network-manager
may work (not sure if it would try to restart itself after that though).
What puzzles me is that Firefox has this issue while Chrome doesn't in otherwise the very same situation. Which seemed to point to me that there's some sort of network disturbance that Firefox's network stack decides is enough to drop the connection while Chrome seems to kind of "ignore" the disturbance.
That's definitely interesting 🤔
I must admit that I'm horrible at networking, so if things get too complicated 😅. Interested to hear though if the above helps.
I recall that network-manager has a configuration option that allows excluding certain interfaces (wondering if there's a "portable" solution for that to exclude the container interfaces, and if that would help for these setups)
Using sudo systemctl stop NetworkManager.service
(your suggestions also appeared to stop the network-manager) and validating with sudo systemctl status NetworkManager.service
that it's off did not result in any change of behavior in Firefox (still drops connection to container), also still works in Chrome.
Checking in on this issue. How should we proceed?
Could we do something on our end to prevent the problem from occuring?
Expected behavior
Running the
docker run hello-world
command should not interfere with a different already running container. Specifically another container running a simple web server (using Flask) which handles long running requests.Actual behavior
In Firefox (77.0.1 (64-bit)) while the request is pending (in this simple reproducible example a simple GET request that sleeps 30 seconds before returning) the connection is dropped when
docker run hello-word
is executed. The request never completes. On Google Chrome (Version 83.0.4103.61 (Official Build) (64-bit)) this issue does not occur.We ran into this bug in a much more complicated setting, but we created a minimal example to make reproducing easier.
Steps to reproduce the behavior
Build the container:
docker build -t minimal-flask .
Start container with:
docker run -p 80:80 minimal-flask
Use Firefox to make a request to this running container (at
http://127.0.0.1
).While the request is waiting (e.g. after 5 seconds have passed), run
docker run hello-world
. Observe the request failing in Firefox.We also saw this happening while performing other basic Docker operations such as: Ctrl + C'ing out of another container or stopping a different container using
docker stop <id>
.main.py
Dockerfile
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.)