dcos / dcos-e2e

Spin up and manage DC/OS clusters in test environments
Apache License 2.0
60 stars 21 forks source link

Having issues using VPN/SSH with DC/OS 1.11.4 on macOS #1302

Open davidhesson opened 6 years ago

davidhesson commented 6 years ago

CLI version: 2018.8.31.0 Docker version: 18.06.1-ce-mac73 DC/OS: 1.11.4 macOS: 10.13.6

Trying to create a cluster with ssh transport using CLI and am having issues connecting to the nodes. I have ran dcos-docker setup-mac-network and can open a tunnel to the node using openvpn via command line.

My create cluster command:

$ dcos-docker create --workspace-dir ./workspace/dcos-1.11.4 --transport ssh --wait-for-dcos ./releases/1.11.4.sh
Error creating cluster.

From doctor output (three levels verbose), I see that all the sshd.socket systemd services are inactive/dead on cluster boot. Is this normal or causing my ssh http checks to fail? I can not access any nodes in the cluster over VPN (e.g. web ui).

Here is my dcos-docker inspect output

adamtheturtle commented 6 years ago

Thank you @davidhesson .

I believe that the sshd.socket output is a red herring. We check for both sshd.service and sshd.socket in order to support multiple operating systems - only one s needed.

The key part is this:

Warning: Cannot connect to a Docker container by its IP address. This is needed for features such as connecting to the web UI and using the DC/OS CLI. To use the "wait" command without resolving this issue, use the "--skip-http-checks" flag on the "wait" command. We recommend using "dcos-docker setup-mac-network" to resolve this issue.

You say:

I have ran dcos-docker setup-mac-network

This command:

The containers can be shown with docker ps:

Adam@MacBook-Pro ~/D/m/d/dcos-e2e> docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                        NAMES
bcdd8a6705d3        dcos-e2e/openvpn    "/local/helpers/run.…"   4 weeks ago         Up 3 days                                        e2e-openvpn
944e94314945        dcos-e2e/proxy      "socat TCP-LISTEN:13…"   4 weeks ago         Up 3 days           127.0.0.1:13194->13194/tcp   e2e-proxy

Please can you confirm that these containers are shown.

Then, please can you describe which VPN application you opened the .ovpn file with and confirm that you are connected.

For example, using Shimo, I can see that I am connected in the Shimo preferences:

screen shot 2018-09-07 at 09 14 51

can open a tunnel to the node using openvpn via command line.

This may answer some of the above questions but as I am not familiar with this I have asked for confirmation in ways that I am familiar with.

davidhesson commented 6 years ago

Hi @adamtheturtle

Thankfully, I forgot to terminate the cluster.

Here is a docker ps

255  ~/dev/workspace/dcos  docker ps                                                                                                                                                                                      7.20h  Fri 09:47
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                        NAMES
fd7b613f9918        mesosphere/dcos-docker   "/sbin/init"             32 hours ago        Up 32 hours                                      dcos-e2e-87d8d9c6-6962-461e-9a28-9b5542b72c08-public-agent-0
853d853ad41d        mesosphere/dcos-docker   "/sbin/init"             32 hours ago        Up 32 hours                                      dcos-e2e-87d8d9c6-6962-461e-9a28-9b5542b72c08-agent-0
991230c87084        mesosphere/dcos-docker   "/sbin/init"             32 hours ago        Up 32 hours                                      dcos-e2e-87d8d9c6-6962-461e-9a28-9b5542b72c08-master-0
c3ddc1a7bfac        dcos-e2e/openvpn         "/local/helpers/run.…"   32 hours ago        Up 32 hours                                      e2e-openvpn
bd66573ff107        dcos-e2e/proxy           "socat TCP-LISTEN:13…"   32 hours ago        Up 32 hours         127.0.0.1:13194->13194/tcp   e2e-proxy

Here is the output of my VPN command from CLI

   ~/dev/workspace/dcos  sudo openvpn /Users/david/Documents/docker-for-mac.ovpn                                                                                                       16.57h  Fri 09:55
Password:
Fri Sep  7 09:55:55 2018 OpenVPN 2.4.4 x86_64-apple-darwin16.7.0 [SSL (OpenSSL)] [LZO] [LZ4] [PKCS11] [MH/RECVDA] [AEAD] built on Oct  2 2017
Fri Sep  7 09:55:55 2018 library versions: OpenSSL 1.0.2p  14 Aug 2018, LZO 2.10
Fri Sep  7 09:55:55 2018 TCP/UDP: Preserving recently used remote address: [AF_INET6]::1:13194
Fri Sep  7 09:55:55 2018 Attempting to establish TCP connection with [AF_INET6]::1:13194 [nonblock]
Fri Sep  7 09:55:56 2018 TCP connection established with [AF_INET6]::1:13194
Fri Sep  7 09:55:56 2018 TCP_CLIENT link local: (not bound)
Fri Sep  7 09:55:56 2018 TCP_CLIENT link remote: [AF_INET6]::1:13194
Fri Sep  7 09:55:57 2018 [localhost] Peer Connection Initiated with [AF_INET6]::1:13194
Fri Sep  7 09:55:58 2018 Opening utun (connect(AF_SYS_CONTROL)): Resource busy (errno=16)
Fri Sep  7 09:55:58 2018 Opened utun device utun1
Fri Sep  7 09:55:58 2018 do_ifconfig, tt->did_ifconfig_ipv6_setup=0
Fri Sep  7 09:55:58 2018 /sbin/ifconfig utun1 delete
ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address
Fri Sep  7 09:55:58 2018 NOTE: Tried to delete pre-existing tun/tap instance -- No Problem if failure
Fri Sep  7 09:55:58 2018 /sbin/ifconfig utun1 192.168.255.6 192.168.255.5 mtu 1500 netmask 255.255.255.255 up
add net 172.16.0.0: gateway 192.168.255.5
add net 192.168.255.1: gateway 192.168.255.5
Fri Sep  7 09:55:58 2018 WARNING: this configuration may cache passwords in memory -- use the auth-nocache option to prevent this
Fri Sep  7 09:55:58 2018 Initialization Sequence Completed

I've also tried using Shimo, and the result was the same. Here is doctor output whilst using Shimo as my VPN client.

screen shot 2018-09-07 at 10 04 12 am

 2  ~/dev/workspace/dcos  dcos-docker doctor                                                                                                                                                                               1.41h  Fri 09:47

Warning: The version of ``sed`` is not compatible with installers for DC/OS 1.9 and below. See http://dcos-e2e.readthedocs.io/en/latest/versioning-and-api-stability.html#dc-os-1-9-and-below.

Note: Docker has approximately 7.8 GB of memory available. The amount of memory required depends on the workload. For example, creating large clusters or multiple clusters requires a lot of memory.
A four node cluster seems to work well on a machine with 9 GB of memory available to Docker.
To dedicate more memory to Docker for Mac, go to Docker > Preferences > Advanced.

Warning: Cannot connect to a Docker container by its IP address. This is needed for features such as connecting to the web UI and using the DC/OS CLI. To use the "wait" command without resolving this issue, use the "--skip-http-checks" flag on the "wait" command. We recommend using "dcos-docker setup-mac-network" to resolve this issue.

Note: If you continue to experience problems, more information is available at http://dcos-e2e.readthedocs.io/en/latest/docker-backend.html#troubleshooting.
adamtheturtle commented 6 years ago

@davidhesson Thank you for that detail. Sorry for the delayed response - I was on vacation 🌴

I am not sure what the issue is and I will think on this.

CoericK commented 6 years ago

Got the same issue here. I'm running OS X 10.13.6, Docker Version 18.06.1-ce-mac73 (26764), dcos-docker, version 2018.9.6.0. These are the steps I followed:

Note: Docker has approximately 15.6 GB of memory available. The amount of memory required depends on the workload. For example, creating large clusters or multiple clusters requires a lot of memory. A four node cluster seems to work well on a machine with 9 GB of memory available to Docker. To dedicate more memory to Docker for Mac, go to Docker > Preferences > Advanced.

Warning: Cannot connect to a Docker container by its IP address. This is needed for features such as connecting to the web UI and using the DC/OS CLI. To use the "wait" command without resolving this issue, use the "--skip-http-checks" flag on the "wait" command. We recommend using "dcos-docker setup-mac-network" to resolve this issue.

Note: If you continue to experience problems, more information is available at http://dcos-e2e.readthedocs.io/en/latest/docker-backend.html#troubleshooting.

Then I ran `dcos-docker setup-mac-network` which gives me this ouput:
  1. Install an OpenVPN client such as Tunnelblick (https://tunnelblick.net/downloads.html) or Shimo (https://www.shimovpn.com).
  2. Run "open /Users/erick/Documents/docker-for-mac.ovpn".
  3. If your OpenVPN client is Shimo, edit the new "docker-for-mac" profile's Advanced settings to deselect "Send all traffic over VPN".
  4. In your OpenVPN client, connect to the new "docker-for-mac" profile.
  5. Run "dcos-docker doctor" to confirm that everything is working.
    
    I have already installed Tunnelblick 3.7.6a (build 5080), so I only ran:
    `open /Users/erick/Documents/docker-for-mac.ovpn`
    Asked for permissions to setup the new config, completed successfully, then i click on Connect to docker-for mac, I got stuck Waiting for response ...

Here's is part of the log from Tunnelblick:

2018-09-22 11:12:38 MANAGEMENT: >STATE:1537632758,RESOLVE,,,,,,
2018-09-22 11:12:38 TCP/UDP: Preserving recently used remote address: [AF_INET]127.0.0.1:13194
2018-09-22 11:12:38 Socket Buffers: R=[131072->131072] S=[131072->131072]
2018-09-22 11:12:38 Attempting to establish TCP connection with [AF_INET]127.0.0.1:13194 [nonblock]
2018-09-22 11:12:38 MANAGEMENT: >STATE:1537632758,TCP_CONNECT,,,,,,
2018-09-22 11:12:39 TCP connection established with [AF_INET]127.0.0.1:13194
2018-09-22 11:12:39 TCP_CLIENT link local: (not bound)
2018-09-22 11:12:39 TCP_CLIENT link remote: [AF_INET]127.0.0.1:13194
2018-09-22 11:12:39 MANAGEMENT: >STATE:1537632759,WAIT,,,,,,
2018-09-22 11:13:39 TLS Error: TLS key negotiation failed to occur within 60 seconds (check your network connectivity)
2018-09-22 11:13:39 TLS Error: TLS handshake failed
2018-09-22 11:13:39 Fatal TLS error (check_tls_errors_co), restarting