docker / for-win

Bug reports for Docker Desktop for Windows
https://www.docker.com/products/docker#/windows
1.85k stars 286 forks source link

Docker windows mode breaks Hyper-V vEthernet (Default Switch) #1166

Open AceHack opened 6 years ago

AceHack commented 6 years ago

Expected behavior

Create Hyper-V VM and ping google.com works Looks like Default Switch is a new built in Hyper-V natted virtual switch that is created by default now when you install Hyper-V in the latest windows version. I don't think docker is expecting this.

Actual behavior

ping google.com results in Ping request could not find host google.com. Please check the name and try again.

Information

Steps to reproduce the behavior

  1. Install All the following windows components Containers, Guarded Host (Optional), Hyper-V (All), Windows Defender Application Guard (Optional)
  2. Create Hyper-V VM and assign it's network to the default built in "Default Switch"
  3. Open VM and ping google.com and things work as expected
  4. Install docker for windows (default is Linux mode when you first install)
  5. Open VM and ping google.com and things still work as expected
  6. Switch docker to windows mode
  7. Open VM and ping google.com and things no longer work as expected you get the error "Ping request could not find host google.com. Please check the name and try again."
  8. If you ping 8.8.8.8 or 172.217.9.206 (Google) things still work as expected so it appears to be something with DNS that docker is breaking in windows mode for the Hyper-V Default Switch

I've attached some diagnostics from the script WindowsContainerNetworking-LoggingAndCleanupAide.ps1 if it offers any help. FYI, networking inside the containers are working fine, this is just an issues in the Hyper-V VMs.

Thanks.

PreCleanupState_TNDQFBTE.zip

AceHack commented 6 years ago

FYI, there is also another default adapter that Hyper-V creates that you have no control over.

vEthernet (HvsiIcs)

This seems new as well in the latest windows versions.

mle-ii commented 6 years ago

Pretty sure I'm hitting or hit something similar to your issue, though by default mine appears to be using a broken nat setup on my machine. When I run the Debug-ContainerHost.ps1 script it shows my nat config to be broken. I can almost get the "Default Switch" one to work by configuring DNS when running inside the console, but it doesn't work when I try to build with that network as it doesn't resolve dns names even after setting DNS.

I'm on the latest Windows 10 Pro - 16299.15.amd64fre.rs3_release.170928-1534 Docker version 17.10.0-ce, build f4ffd25

These are the networks I see:

C:\Windows\System32>docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
21783432030c        Default Switch      ics                 local
82b167c5c8d5        nat                 nat                 local
426acc0aeceb        none                null                local

I've tried uninstalling Docker and the Container feature and reinstalling various versions but they are all broken network wise and none of the networking troubleshooting docs I've read have been able to get it back to working.

mle-ii commented 6 years ago

Ok, I still don't know if I am hitting the same issue but I was finally able to get this working though it took quite of work and I have no idea which thing fixed it. :(

Still on Windows 10 Pro 1709 with all the latest updates. I uninstalled Docker. Renamed the c:\programdata\docker directory. Removed the Containers and the Hyper-V features. Rebooted post removal of features. Went through the registry and tried to clean up all the old virtual networking related things by searching for docker/vswitch/nat/HNS and looking at other items nearby in the registry. Removing ones that seemed to be network settings related to docker or Hyper-V. EDIT: Also went into Device Manager and removed all the virtual network adapters/cards added by Docker/Hyper-V Rebooted. Installed the Containers and Hyper-V features. Rebooted. Installed Docker latest CE Edge version. Rebooted. Told it to not use the LCOW feature when Docker started up. Switched to Windows Containers. Tried to run a simple thing and it failed.

C:\Windows\System32>docker run -ti microsoft/nanoserver ipconfig /all
Unable to find image 'microsoft/nanoserver:latest' locally
latest: Pulling from microsoft/nanoserver
bce2fbc256ea: Download complete
b0b5e40cb939: Download complete
docker: Error response from daemon: open \\.\pipe\docker_engine_windows: The system cannot find the file specified.
See 'docker run --help'.

C:\Windows\System32>docker run -ti microsoft/nanoserver ipconfig /all
docker: Error response from daemon: open \\.\pipe\docker_engine_windows: The system cannot find the file specified.
See 'docker run --help'.

Turned off "Experimental features". This is very likely a separate issue, but wanted to call it out here in case someone hit this when trying out the steps I ran. https://github.com/docker/for-win/issues/1252

Reran the various "tests" that failed before and they all seemed to work fine. docker run -ti microsoft/nanoserver ipconfig /all - Failed until I turned off "Experimental features" even with the latest "clean" setup. docker run -ti microsoft/nanoserver ping 8.8.8.8 - Failed before until I told it to use the "Default Switch" network. docker run -ti microsoft/nanoserver ping google.com - Failed before unless I told it to use the "Default Switch" network and also configured DNS while connected to the container.

mle-ii commented 6 years ago

One additional item. I reran this Invoke-WebRequest https://aka.ms/Debug-ContainerHost.ps1 -UseBasicParsing | Invoke-Expression and the network still seems to have at least one issue.

Describing Container network is created
 [+] At least one local container network is available 2.23s
 [+] At least one NAT, Transparent, or L2Bridge Network exists 33ms
 [+] NAT Network's vSwitch is internal 12ms
 [-] A Windows NAT is configured if a Docker NAT network exists 296ms
   Expected {0} to be greater than or equal to {1}
   221:        $winnatCount | Should Not BeLessThan $natCount
   at <ScriptBlock>, <No file>: line 221
 [+] Specified Network Gateway IP for NAT network is assigned to Host vNIC 124ms
 [+] NAT Network's internal prefix does not overlap with external IP' 23ms

Not sure if this error from the docker logs is related. 10/26/2017 11:20:45 AM 1 Error Error occurred when creating network insufficient vnis(0) passed to overlay. Windows driver requires VNIs to be prepopulated

Also Get-NetNat returns nothing.

ghost commented 6 years ago

I also have this problem. Enabling LCOW with 1709+ destroys DNS functionality on the "Default Switch". Unfortunately, the "nat" switch that Docker LCOW creates does not work for non-Docker based VMs either. It would be nice if both could work simultaneously. If I manually set the /etc/resolv.conf to a functioning external DNS server (e.g., 8.8.8.8) then the VM will work until it renews the DHCP and then of course it breaks again.

tegaaa commented 6 years ago

I'm also having the same issue same as @AceHack. Since I'm heavy user of docker with windows containers and VM this is really anoying... hope this will be fixed soon.

draggeta commented 6 years ago

Same issue for me as well. I've removed docker for now until this issue is fixed as it otherwise breaks my other labs.

gbraad commented 6 years ago

Reproducible here. I have used LCOW for tests, and now "Default Switch" is not able to offer a DHCP with nameservers set, as the VM will only have search mshome.net in /etc/resolv.conf.


Using

Get-NetIPInterface -InterfaceAlias "vEthernet (Default Switch)" -AddressFamily IPv4 | Get-NetIPAddress | ForEach-Object { $_.IPAddress }

seems to provide me with the correct address for use as DNS (and Gateway) for the VMs on the "Default Switch"

W1M0R commented 6 years ago

I experienced the same symptoms. I solved the problem by:

  1. Resetting my Default Ethernet connection (the one typically connected to my local network and the internet), by going into Properties and putting a checkmark on the following unchecked items:

    a. Client for Microsoft Networks b. File and Printer Sharing for Microsoft Networks c. QoS Packet Scheduler d. Internet Protocol Version 4 (TCP/IPv4) e. Microsoft LLDP Protocol Driver f. Internet Protocol Version 6 (TCP/IPv6) g. Link-Layer Topology Discovery Responder h. Link-Layer Topology Discovery Mapper I/O Driver i. Hyper-V Extensible Virtual Switch.

    Most of these items are usually checked by default for the Default Ethernet connection, but they were somehow unchecked by Docker for Windows, perhaps during a reset of Docker, or the creation of additional Hyper-V Virtual Network Switches, or the switching to Windows Containers mode.

  2. Configuring Docker for Windows so that it doesn't automatically start with Windows.

  3. Restarting Windows.

  4. Manually launching Docker for Windows.

Without these changes, the following would happen:

  1. Windows would use a vEthernet (External) connection as its Ethernet connection, since the Default Ethernet connection didn't have the required features.
  2. Hyper-V would then use the wrong Ethernet connection for its Virtual Network switches.
  3. Docker for Windows would then also use the incorrect Ethernet connection.

My setup is: Windows 10 Pro 64-bit with 7 Network Connections (1 Main Ethernet, 6 Hyper-V vEthernet). One vEthernet connection is an "External Connection", the rest are "Internal Connections" (NAT). The vEthernet connections are called: Default Switch, DockerNAT, Internal Ethernet Port Windows Phone Emulator, nat, 4a85a7234be272d, and External Switch.

Perhaps this issue is related to the fact that Hyper-V does not support Multiple NAT connections: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/setup-nat-network#troubleshooting

shelmire commented 6 years ago

@W1M0R Thanks for the post! I installed Docker on Windows and restarted to no internet connection. Similar setup to yours.

jcmarchi commented 6 years ago

The problem is not related to the fact Hyper-V does not support Multiple NAT connections. I can make it work very easily with a custom network and a couple tweaks in the Default Switch and the DockerNAT, mostly making the Network "external" and adding (routing) the new network IP address in their scope, but after each reboot, Docker simply UNDO all my changes, which is extremely annoying, aggravating and unacceptable!

How stupid this problem is.

I hate applications that force the hand on users.

When I discuss it and hear from the Docker Team they claim it is for security purposes, but they ignore the fact not all Docker usage is for production environments and this "limitation" prevents developers from set their network environments accordingly with their needs.

Also, no one is able to explain why MACVLAN doesn't work in Windows!

In Linux it all work, flawlessly, but in Windows nothing does.

I am not sure if Docker team is not competent enough to develop proper code in Windows or if they simply do not want to make it available for Windows users. Either way, it is a shame (and this is a voice of frustration speaking out loud).

My apologies but I am frustrated mostly because the way changes are reversed after boot. If I change something in my setting I do not want any crappy code touching it!!!! Docker team should know better...

BTW, Hyper-V is not a problem because I can create multiple instances of whatever systems I want in there and all will work with the network I set as I want without issues, and without changing it after reboot. Docker is the problem!

Notwithstanding, I suggest a little "debug hack" in the "MobiLinuxVM"... Most advanced users will find a set of iptables rules and other conditions that will give you nightmares (thus you will understand why the fstab is never executed in your Linux VM, the one you thought should work exactly as a real server does).

xtremeperf commented 5 years ago

@jcmarchi I'm not sure why you're thinking this has anything to do with Docker, because NAT and all the other virtual switch types are part of the Microsoft Windows networking stack. Those IP addresses that are changing after each reboot, those are configured and managed by Microsoft Windows services and drivers.

I think I might know the problem causing you guys these issues. Can you please reply with the output from running these four Powershell commands? Be sure to run as admin.

(Get-ItemProperty -Path "HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion").CurrentBuild

Get-VMNetworkAdapter -ManagementOS

Get-NetAdapter | Format-Table Name,InterfaceDescription,ifIndex,Status,LinkSpeed,MediaConnectionState

Get-NetIPInterface

jcmarchi commented 5 years ago

I heard that before, that is the Windows Network Layer in the way, and for a while, I accepted it because in Linux it is all easily done. However, I started "questioning" the true-truth of this answer when I start digging into the Linux VM Docker create in Hyper-V and noticed a couple of weird behavior, especially in the IPTABLES and FSTAB.

Notwithstanding if I create a new VM in Hyper-V and install whatever OS I want into it I can easily and trouble free create my network as I desire, set static IPs for each VM and access (or restrict access) as I please.

I can replicate the same infrastructure design (and any other I imagine) either with Hyper-V or DropBox, but not with Docker, with is a shame IMO.

But, as per your request:

PS C:\WINDOWS\system32> (Get-ItemProperty -Path "HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion").CurrentBuild
17134

PS C:\WINDOWS\system32> Get-VMNetworkAdapter -ManagementOS

Name                   IsManagementOs VMName SwitchName     MacAddress   Status IPAddresses
----                   -------------- ------ ----------     ----------   ------ -----------
DockerNAT              True                  DockerNAT      00155D01C417 {Ok}
Container NIC 57aab818 True                  nat            00155DDE2140 {Ok}
Container NIC bc84dddb True                  Default Switch 3615B9BEC540 {Ok}

PS C:\WINDOWS\system32> Get-NetAdapter | Format-Table Name,InterfaceDescription,ifIndex,Status,LinkSpeed,MediaConnection
State

Name                           InterfaceDescription                           ifIndex Status       LinkSpeed MediaConne
                                                                                                             ctionState
----                           --------------------                           ------- ------       --------- ----------
Wi-Fi                          Realtek 8821AE Wireless LAN 802.11ac PCI-E NIC      26 Disconnected 0 bps     ...nnected
Ethernet 3                     TAP-Windows Adapter V9                              25 Disconnected 100 Mbps  ...nnected
vEthernet (nat)                Hyper-V Virtual Ethernet Adapter #4                 19 Up           10 Gbps    Connected
Ethernet 2                     Realtek PCIe GBE Family Controller                  18 Up           1 Gbps     Connected
vEthernet (DockerNAT)          Hyper-V Virtual Ethernet Adapter #2                 13 Up           10 Gbps    Connected
vEthernet (Default Switch)     Hyper-V Virtual Ethernet Adapter                    11 Up           10 Gbps    Connected
Ethernet                       Intel(R) Ethernet Connection (2) I219-V             10 Disconnected 0 bps     ...nnected
Bluetooth Network Connection 2 Bluetooth Device (Personal Area Network) #2          8 Disconnected 3 Mbps    ...nnected

In this specific machine I have two NIC and one WiFi card. I am only using one NIC and the WiFi is offline. Just FYI.

I am deeply curious to see where we will get with it. High hopes again. :)

Thanks.

xtremeperf commented 5 years ago

The problem is your Windows networking configuration.

Network traffic will be routed to the network interface having the lowest InterfaceMetric value. If you take a peak at your routing tables, you will notice that route metric and interface metric are summed up together and then used to determine routing. Lower values have priority.

Although it may appear your routing tables are correct, because IPs are getting routed correctly but DNS is not, the Host Networking Service (HNS) also needs to configure DNS properly, and it uses InterfaceMetric for choosing routes based on priority. HNS is a brand new Windows service that works alongside WinNAT and VFP drivers, dynamically creating port forwarding rules, mapping and policy for those drivers. HNS is also responsible for the creation and management of virtual switches, address translation (NAT), IP addresses, IP pools, DNS, namespaces, endpoints, ports, filter driver policies, etc. for both Hyper-V and Docker.

I think that HNS is actually supposed to be able to recognize MediaConnectionState and re-order routes appropriately when an adapter is in the disconnected or disabled state, but currently this functionality does not exist in HNS, therefore it is required that you manually set the order using InterfaceMetric values when there are multiple physical network adapters present.

THE SOLUTION: Assign your primary internet-connected network adapter a lower InterfaceMetric value than the other physical adapters which are present.

By default, Windows automatically assigns InterfaceMetric values based on LinkSpeed. Gigabit Ethernet adapters are often assigned the lowest value of '5' and 802.11ac Wi-Fi adapters are often assigned a value of '35'. In this example case, if Wi-Fi is your primary internet-connected network adapter, and Ethernet is disconnected or even disabled, your containers and VMs might have direct IP address communication, but no DNS resolution, and it's likely you would get a "Timed out while waiting for Docker daemon to be ready" error when attempting to start Docker or switch to Windows Containers mode.

Run the following in an elevated Powershell session to view current configuration: Get-NetIPInterface | Sort-Object -Property InterfaceMetric -Descending

Then assign your primary interface ('Wi-Fi' in this example) to a lower InterfaceMetric than all others: Set-NetIPInterface -InterfaceAlias 'Wi-Fi' -InterfaceMetric 3

Make sure InterfaceMetric is changed to the same low value for both IPv4 and IPv6. You can also change adapter options in Windows control panel, instead of Powershell, by un-checking "automatic metric" and entering a value in the field.

Note: you should not have any NetNats set up manually, as this will break things currently. Remove all NetNats by running the following in an elevated Powershell session: Get-NetNat | Remove-NetNat

Also, you will want to remove any external switches, bridges or internet connection sharing until everything else is configured and working properly.

Reboot until 'Default Switch' and 'nat' are both in the 172.16.0.0/12 range (172.16.0.0 - 172.31.255.255). Sometimes it takes a couple reboots for HNS to get it right. If your system uses a 'vEthernet (HvsiIcs)' network adapter, it may not appear until after you open Microsoft Edge browser in Application Guard mode first. This adapter will most likely be assigned an IP in the 192.168.0.0/16 range (192.168.0.0 - 192.168.255.255).

If HNS is struggling to get it right, or if you want to completely remove all dynamic switches and start with a fresh HNS configuration, you can do the following with elevated permissions:

Keep in mind that the HNS service does it's work dynamically and on-the-fly, so if you are browsing network switches in Hyper-V/Docker, or running Get-Net* commands in Powershell, the HNS service will be queried during these actions and may overwrite your settings, and will likely re-start the HNS and/or Docker services. I recommend completing your tasks and rebooting immediately.

docker-robott commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale comment. Stale issues will be closed after an additional 30d of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows. /lifecycle stale

Saibamen commented 5 years ago

/lifecycle frozen

Saibamen commented 5 years ago

/remove-lifecycle stale

mitchellrj commented 5 years ago

This issue also affects Skype for Business on Windows. If the Docker adapter is enabled when Skype for Business starts up, then file transfers, video calling and screen sharing may be impacted. You might see error messages indicating timeouts such as "connection timeout", " failed to accept the invitation" and so on. If other people try to video call or share screen with you, you might not receive any notification that they are trying to do so.

Disabling the docker adapter, starting Skype for Business and then re-enabling the Docker adapter fixes the issue.

planetmarshall commented 5 years ago

@xtremeperf Thanks for that solution. I never would have solved that in a million years.

andrew-j-hagner commented 4 years ago

Is this bug being worked on at all? Its a pretty painful workaround.

ctolkien commented 4 years ago

This workaround did not work for me. I am connected via a physical adapter, can ping ip addresses, but no DNS resolution from inside hyper-v VMs. Physical adapters have an InterfaceMetric that it lower than all others (excepting the DockerNAT?)

satapathysangita commented 4 years ago

docker swarm init worked for me

jfuqua7 commented 1 year ago

Check for duplicate entries in here: C:\Windows\System32\drivers\etc\hosts.ics