docker / for-win

Bug reports for Docker Desktop for Windows
https://www.docker.com/products/docker#/windows
1.87k stars 289 forks source link

Transparent HTTP proxying on Windows / WSL2 breaks valid HTTP requests #13258

Open AdrianoKF opened 1 year ago

AdrianoKF commented 1 year ago

Actual behavior

Note: See below for steps to reproduce, for simplicity this runs a command inside an alpine:latest container.

Trying to make an HTTP/1.0 request without a host header fails with an error message of unclear origin (see Information section below on my best guess):

/ # curl -H'Host:' --http1.0 httpbin.org/anything
connecting to :80: connecting to <nil>:80: dial tcp :80: connectex: No connection could be made because the target machine actively refused it.

I have experienced similar problems with gRPC over HTTP, which uses HTTP/2 as the transport protocol.

Expected behavior

Using curl to access httpbin should return the HTTP response from the server:

/ # curl -H'Host:' --http1.0 httpbin.org/anything
{
  "args": {},
  "data": "",
  "files": {},
  "form": {},
  "headers": {
    ...
  },
  "json": null,
  "method": "GET",
  "origin": "...",
  "url": "http://a0207c42-pmhttpbin-pmhttpb-c018-592832243.us-east-1.elb.amazonaws.com/anything"
}

Information

This behavior is reproducible and happens for all outgoing HTTP traffic on port 80.

I have been able to reproduce it on Docker Desktop running on Windows 11 (running on bare metal) with the WSL2 backend. From what I can tell, the error is caused by the missing Host header, which seems to upset the transparent HTTP proxying happening inside vpnkit (that's as far as I managed to understand the root cause - happy to report my findings if it helps). The error message makes a mention of <nil>:80, which seems to indicate that the proxy unsuccessfully tried to determine the target of the HTTP request and just falls back to an empty value instead.

This behavior breaks two valid use cases:

As a side note: the internal behavior of the transparent proxy also makes for some additional weird behavior in cases where the "original" TCP endpoint and the Host header disagree, causing the proxy to completely disregard the original destination (other than to send a SYN packet to see if the connection can be established):

/ # curl -H 'Host: google.com' facebook.com
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

The response is actually returned from google.com, despite the request clearly being intended for facebook.com. Regardless of the above issues, this is very surprising behavior in its own right.

Output of & "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check

& "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check
[2023-02-22T11:23:53.334571700Z][com.docker.diagnose.exe][I] set path configuration to OnHost
Starting diagnostics

[PASS] DD0027: is there available disk space on the host?
[PASS] DD0028: is there available VM disk space?
[PASS] DD0002: does the bootloader have virtualization enabled?
[SKIP] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0022: is the Virtual Machine Platform Windows Feature enabled?
[PASS] DD0021: is the WSL 2 Windows Feature enabled?
[PASS] DD0024: is WSL installed?
[PASS] DD0025: are WSL distros installed?
[PASS] DD0026: is the WSL LxssManager service running?
[PASS] DD0029: is the WSL 2 Linux filesystem corrupt?
[PASS] DD0035: is the VM time synchronized?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0013: is the $PATH ok?
[PASS] DD0003: is the Docker CLI working?
[PASS] DD0005: is the user in the docker-users group?
[PASS] DD0038: is the connection to Docker working?
[PASS] DD0014: are the backend processes running?
[PASS] DD0007: is the backend responding?
[PASS] DD0008: is the native API responding?
[PASS] DD0009: is the vpnkit API responding?
[PASS] DD0010: is the Docker API proxy responding?
[PASS] DD0006: is the Docker Desktop Service responding?
[SKIP] DD0030: is the image access management authorized?
[PASS] DD0033: does the host have Internet access?
[PASS] DD0002: does the bootloader have virtualization enabled?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0022: is the Virtual Machine Platform Windows Feature enabled?
[PASS] DD0021: is the WSL 2 Windows Feature enabled?
[PASS] DD0024: is WSL installed?
[PASS] DD0025: are WSL distros installed?
[PASS] DD0026: is the WSL LxssManager service running?
[PASS] DD0029: is the WSL 2 Linux filesystem corrupt?
[PASS] DD0035: is the VM time synchronized?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0032: do Docker networks overlap with host IPs?
segment 2023/02/22 12:24:03 ERROR: sending request - Post "https://api.segment.io/v1/batch": dial tcp: lookup api.segment.io: getaddrinfow: The requested name is valid, but no data of the requested type was found.
segment 2023/02/22 12:24:03 ERROR: 1 messages dropped because they failed to be sent and the client was closed
No fatal errors detected.

Steps to reproduce the behavior

FROM alpine
RUN apk add curl && \
    curl -H'Host:' --http1.0 httpbin.org/anything

Execute on Windows with WSL2 backend; docker build --no-cache --progress=plain . to see the relevant output:

connecting to :80: connecting to <nil>:80: dial tcp :80: connectex: No connection could be made because the target machine actively refused it.
discordianfish commented 1 year ago

I have experienced similar problems with gRPC over HTTP, which uses HTTP/2 as the transport protocol.

@AdrianoKF What where the symptoms? I'm also debugging a gRPC issue. Works fine on all machines I have access to but one of our users keeps getting "connectex: No connection could be made because the target machine actively refused it." despite it working fine when we test e.g nginx.

AdrianoKF commented 1 year ago

I have experienced similar problems with gRPC over HTTP, which uses HTTP/2 as the transport protocol.

@AdrianoKF What where the symptoms? I'm also debugging a gRPC issue. Works fine on all machines I have access to but one of our users keeps getting "connectex: No connection could be made because the target machine actively refused it." despite it working fine when we test e.g nginx.

In my case I received an application-level gRPC error message, leading to all gRPC calls to fail:

rpc error: code = Unavailable desc = connection closed before server preface received

I tracked down the origin by capturing traffic inside the container using Wireshark/tshark, where I noticed the error message you are also seeing as a response packet to the initial HTTP/2 magic request.

It's a bit tricky to debug depending on the tool used to investigate, since curl, e.g., will send an HTTP/1.1 request with an Upgrade header to establish an HTTP/2 connection, instead of sending out HTTP/2 traffic right away. Also, packet captures are quite different inside the container and on the host (as is expected from my understanding of the vpnkit architecture -- which I only learned about after my initial root cause analysis).

discordianfish commented 1 year ago

Ok this looks different, we can't even connect to the port which is super strange since we don't seem to be able to reproduce it with e.g a nginx container. I've filled #13283 for that but maybe it has a similar root cause in vpnkit..

docker-robot[bot] commented 1 year ago

There hasn't been any activity on this issue for a long time. If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment. If not, this issue will be closed in 30 days.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

AdrianoKF commented 1 year ago

/remove-lifecycle stale