docker-flow / docker-flow-proxy

Docker Flow Proxy
https://docker-flow.github.io/docker-flow-proxy/
MIT License
317 stars 189 forks source link

Internal domain name resolution from within the swarm #76

Closed ghost closed 5 years ago

ghost commented 5 years ago

Description When I launch a swarm with docker-flow-proxy, domain resolution works from outside the swarm (pages load correctly) but not inside. That is if I wget / nslookup It does not return me an IP address. Because of that my frontend cannot talk to my Auth.

Steps to reproduce the issue:

  1. Launch a multi-node swarm with the swarm-listener + proxy and 2 services. under the deploy of each service, make sure you have defined com.df.serviceDomain. Such as the example below.

      labels:
          - com.df.notify=true
          - com.df.serviceDomain=auth.docker.domain.com
          - com.df.servicePath=/
          - com.df.port=8012

    and configure your host machine hosts accordingly.

  2. Reach your domain (auth.docker.domain.com in my case) from the browser to access one of your service, notice that it works. Reach your domain (frontend.docker.domain.com in my case) from the browser to access one of your service, notice that it works.

  3. Do wget and nslookup from any container to auth.docker.domain.com (docker exec -it NAME sh/powershell)

Describe the results you received:

From the proxy container on a different node than the target container but on the same as the swarm-listener:

bash-4.4# nslookup auth
nslookup: can't resolve '(null)': Name does not resolve

Name:      auth
Address 1: 10.0.6.14 testi_auth.1.jw70op0v75wfxsb1g8e9slpdm.testi_proxy
bash-4.4# nslookup auth.docker.domain.com
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'auth.docker.domain.com': Name does not resolve

and from a Windows container on the same node as the target container (also a windows container) but on a different node than the proxy+swarm listener:

PS C:\> wget http://auth.docker.domain.com/.well-known/openid-conf
wget : The remote name could not be resolved: 'auth.docker.domain.com'

(I could get http://auth.docker.domain.com/.well-known/openid-conf from my host machine)

Describe the results you expected: wget to return me the page and nslookup to return me an IP address

Additional environment details (AWS, VirtualBox, physical, etc.): Virtualbox

thomasjpfan commented 5 years ago

If you run your container without swarm: docker run -ti ... is it able to access auth.docker.domain.com?

ghost commented 5 years ago

Yes that was not a problem before switching to swarm but I was using more simple names like 'auth'. Now moving closer to production I am making a stack in the swarm with a reverse proxy system

thomasjpfan commented 5 years ago

Since you do not have an issue running docker run, you can run docker container inspect $NAME without swarm and observe the network settings. Then start a service, run docker container inspect and compare the network settings with the no swarm case.

Also running without swarm, check which DNS it is using to resolve your auth.docker url.

ghost commented 5 years ago

Thank you for your help! I used docker-compose for convenience because those containers need many environment variables to start correctly (I left the swarm before that). I guess it should be pretty much equivalent to the docker run.

Running without the swarm:

From the frontend container (Windows also).

PS C:\> nslookup auth
Server:  UnKnown
Address:  172.19.48.1

Non-authoritative answer:
Name:    auth
Address:  172.19.50.209
PS C:\> nslookup auth.docker.domain.com
Server:  UnKnown
Address:  172.19.48.1

*** UnKnown can't find auth.docker.domain.com: Server failed

auth is the container name. auth.docker.domain.com is configured in the host's machine hosts file so this makes sense without dockerflow proxy + swarm.

Now Docker inspect without the swarm from the frontend:

        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "19034e4772b85e2c4e6b346192c17b8575cb2c7b49a1a958240cb8843a3478ac",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "8010/tcp": [
                    {
                        "HostIp": "0.0.0.0",
                        "HostPort": "8010"
                    }
                ]
            },
            "SandboxKey": "19034e4772b85e2c4e6b346192c17b8575cb2c7b49a1a958240cb8843a3478ac",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "dockercompose_internal": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": [
                        "frontend",
                        "19034e4772b8"
                    ],
                    "NetworkID": "5158e92a50a483b00dbfa5f866e29dbc11d37fda1d3c79234eb38c63a7a308c0",
                    "EndpointID": "802413464bd52b2ecd894aa09d333d65883c6012a87df6302daee8e6b64d834f",
                    "Gateway": "172.19.48.1",
                    "IPAddress": "172.19.61.145",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "00:15:5d:fc:b7:56",
                    "DriverOpts": null
                }
            }
        }
    }
]

Docker inspect with the swarm from the frontend:

        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "d6ecfd7e4b5c9f98310cfd20ad3e06598edd8122f89b29395a94c09a3d6125c1",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "d6ecfd7e4b5c9f98310cfd20ad3e06598edd8122f89b29395a94c09a3d6125c1",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "testi_proxy": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.8.11"
                    },
                    "Links": null,
                    "Aliases": [
                        "d6ecfd7e4b5c"
                    ],
                    "NetworkID": "rh974g1yct6ys5tih66u3bdse",
                    "EndpointID": "dc63a04bf04dec68870ca642a8c83270bbfb38c8a4c52c5c991c6c74020f8488",
                    "Gateway": "",
                    "IPAddress": "10.0.8.11",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "00:15:5d:ad:5d:1b",
                    "DriverOpts": null
                }
            }
        }

Docker inspect with the swarm from the auth:

 "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "d6ecfd7e4b5c9f98310cfd20ad3e06598edd8122f89b29395a94c09a3d6125c1",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "d6ecfd7e4b5c9f98310cfd20ad3e06598edd8122f89b29395a94c09a3d6125c1",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "testi_proxy": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.8.11"
                    },
                    "Links": null,
                    "Aliases": [
                        "d6ecfd7e4b5c"
                    ],
                    "NetworkID": "rh974g1yct6ys5tih66u3bdse",
                    "EndpointID": "dc63a04bf04dec68870ca642a8c83270bbfb38c8a4c52c5c991c6c74020f8488",
                    "Gateway": "",
                    "IPAddress": "10.0.8.11",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "00:15:5d:ad:5d:1b",
                    "DriverOpts": null
                }
            }
        }
    }
]

If it can be useful, here is one of the 2 DFP replicas (the listener one) networkSettings:

        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "f901e7a48c17ff3766b2bb8ce4cdae11b9029bb35b247796927254e2a891426e",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {
                "443/tcp": null,
                "80/tcp": null,
                "8080/tcp": null
            },
            "SandboxKey": "/var/run/docker/netns/f901e7a48c17",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "ingress": {
                    "IPAMConfig": {
                        "IPv4Address": "10.255.0.112"
                    },
                    "Links": null,
                    "Aliases": [
                        "1489ca145204"
                    ],
                    "NetworkID": "ltbopze6xc5ra4so4cgrkkzpk",
                    "EndpointID": "530966521e5acebd15593020dc7f76114c45b68eb0450d32b908286f46a391d8",
                    "Gateway": "",
                    "IPAddress": "10.255.0.112",
                    "IPPrefixLen": 16,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:ff:00:70",
                    "DriverOpts": null
                },
                "testi_proxy": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.8.9"
                    },
                    "Links": null,
                    "Aliases": [
                        "1489ca145204"
                    ],
                    "NetworkID": "rh974g1yct6ys5tih66u3bdse",
                    "EndpointID": "f7667146381986bb8e582fba9fff00857b801c34b34a8ecf4e2e73c0c0b19176",
                    "Gateway": "",
                    "IPAddress": "10.0.8.9",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:00:08:09",
                    "DriverOpts": null
                }

Initially I was wondering: should it normally resolve the names through the proxy in the internal swarm network without any specific configuration?

Small remark, sometimes when I tried to access from the browser to auth or frontend, I got:

Docker Flow Proxy: 503 Service Unavailable
No server is available to handle this request.

I then refreshed and it worked but that might have just been that the auth/frontend were still starting.


Today, the situation got weirder

After restarting the computer, the vbox nodes and the swarm, only linux nodes are reachable from the outside. all the rest give me "Docker Flow Proxy: 503 Service Unavailable". This is very odd because yesterday it was working after re-deploying (incl rm.) several times - from the outside. I am using dockerflow/docker-flow-proxy:18.09.14-9 (also tried with dockerflow/docker-flow-proxy:18.10.09-13)

From the DFP logs:

2018/10/12 11:07:26 HAPRoxy: Server testi_auth-be8010_0/testi_auth1 is DOWN, reason: Layer4 timeout, check duration: 2001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2018/10/12 11:07:26 HAPRoxy: backend testi_auth-be8010_0 has no server available!
thomasjpfan commented 5 years ago

Regarding the networking issue, if auth.docker.domain.com is only in your host's host file and not on an public DNS server, the service should not be able to resolve it. I am not familiar with how Windows work with docker, but it would be strange for a docker service to be able to read the host's host file to resolve domain names. You can access the services inside swarm by using the service name. What is preventing you from using the service name, auth, to access your service?

Regarding, DFP going down, when DFP says the backend is not available, it means it can not reach it on the overlay network. I recently updated DFPL to fix issues like this. Can you test out your use case with dockerflow/docker-flow-swarm-listener:18.10.12-7

ghost commented 5 years ago

dockerflow/docker-flow-proxy:18.10.09-13 + dockerflow/docker-flow-swarm-listener:18.10.12-7 are now working fine in regards to the weird issue I started to have today. It wasn't initially but after declaring a new network name instead of the former one I was using, it started to work for some unknown reason.

Following your advice, I moved to use auth. My initial assumption was that it would be possible, from inside the swarm, to query back the listening address and that this was the default behavior, which it isn't. Now the error I get inside my application is:

IDX10803: Unable to create to obtain configuration from: 'http://auth/.well-known/openid-config'. 

This is probably because the swarm tries to resolve http://auth without getting through the proxy. In which case, it cannot connect to the port 80 of the container, since the app is internally served on 8010. I have other containers with the same case so I cannot change that internally, I need to use an unused port.

Is it possible to make internal queries go through the proxy?

Context: The application I am dockerizing is a blackbox, outside the config files. In the config files, I can configure a URL but it seems that the same parameter is used for both the public URL that the browser will redirect the user to (which is now http://auth:80 and works fine thanks to the proxy) and the private url for the openid-config that it tries to get inside the swarm (which would have to be http://auth:8010/PATH)

thomasjpfan commented 5 years ago

If your blackbox is able to include the Host: auth.docker.domain.com, in its HTTP request, it should be possible to go through the proxy directly, with http://PROXY_SERVER_NAME.

ghost commented 5 years ago

Thank you, I will explore that. I suppose I can close this issue.

ghost commented 5 years ago

In case someone comes here and takes the same dodgy path, this issue was actually with the hybrid swarm. I got my problems solved by running 18.06.1-ce on both Linux (Ubuntu/Debian) and Windows (server 1803) on physical servers. For Windows, you have to compile 18.06.1-ce yourself.

thomasjpfan commented 5 years ago

@achrjulien Thank you for the update! Having to compile Docker for Windows to resolve your issue is kind of nuts. 😅