gefyrahq / gefyra

Blazingly-fast :rocket:, rock-solid, local application development :arrow_right: with Kubernetes.
https://gefyra.dev
Apache License 2.0
692 stars 28 forks source link

Wireguard Connection not established when using Colima #216

Closed ventsislav-georgiev closed 2 years ago

ventsislav-georgiev commented 2 years ago

What happened?

Running gefyra up is a bit slow, waiting for the operator to be ready:

[INFO] There was no --endpoint argument provided. Connecting to a local Kubernetes node.
[INFO] Installing Gefyra Operator
[INFO] Created network 'gefyra' (a2bbf4fae3)
[INFO] Pulling image "quay.io/gefyra/operator:0.11.4"
[INFO] Successfully pulled image "quay.io/gefyra/operator:0.11.4" in 8.311726046s
[INFO] Pulling image "quay.io/gefyra/stowaway:0.11.4"
[INFO] Successfully pulled image "quay.io/gefyra/stowaway:0.11.4" in 15.011779716s
[INFO] Operator became ready in 87.6396 seconds
[INFO] Deploying Cargo (network sidecar) with IP 172.18.0.149

The operator has the following error in the log:

[2022-10-18 21:01:34,545] kopf.activities.star [ERROR   ] Activity 'check_gefyra_components' failed with an exception. Will retry.
Traceback (most recent call last):
  File "/usr/lib/python3.9/kopf/_core/actions/execution.py", line 279, in execute_handler_once
    result = await invoke_handler(
  File "/usr/lib/python3.9/kopf/_core/actions/execution.py", line 374, in invoke_handler
    result = await invocation.invoke(
  File "/usr/lib/python3.9/kopf/_core/actions/invocation.py", line 116, in invoke
    result = await fn(**kwargs)  # type: ignore
  File "/app/gefyra/handler/components.py", line 211, in check_gefyra_components
    await aw_wireguard_ready
  File "/app/gefyra/stowaway.py", line 80, in get_wireguard_connection_details
    stream_copy_from_pod(
  File "/app/gefyra/utils.py", line 112, in stream_copy_from_pod
    raise e
  File "/app/gefyra/utils.py", line 107, in stream_copy_from_pod
    member = tar.getmember(source_path.split("/", 1)[1])  
  File "/usr/lib/python3.9/tarfile.py", line 1790, in getmember
    raise KeyError("filename %r not found" % name)
KeyError: "filename 'config/peer1/peer1.conf' not found"

And after those, gefyra is not working and running gefyra status indicates it:

{
  "summary": "Gefyra is not running properly",
  "cluster": {
    "connected": true,
    "operator": true,
    "operator_image": "quay.io/gefyra/operator:0.11.4",
    "stowaway": true,
    "stowaway_image": "quay.io/gefyra/stowaway:0.11.4",
    "namespace": true
  },
  "client": {
    "version": "0.11.4",
    "cargo": true,
    "cargo_image": "gefyra-cargo:20221019000235",
    "network": true,
    "connection": false,
    "containers": 0,
    "bridges": 0,
    "kubeconfig": "~/.kube/config",
    "context": "colima",
    "cargo_endpoint": "192.168.5.2:31820"
  }
}

What did you expect to happen?

No errors

How can we reproduce it (as minimally and precisely as possible)?

on M1 mac:

What Kubernetes setup are you working with?

k3s ver: v1.23.6+k3s1

OS version

Darwin ventsislavg 22.1.0 Darwin Kernel Version 22.1.0: Tue Sep 27 22:08:45 PDT 2022; root:xnu-8792.41.6~5/RELEASE_ARM64_T6000 arm64

Anything else we need to know?

The current version doesn't support docker contexts. Check this issue for a workaround: https://github.com/gefyrahq/gefyra/issues/210

SteinRobert commented 2 years ago

I currently don't have a Mac M1 at hand - however my team and I will look into this. Thank you for the brilliant description and all the details!

SteinRobert commented 2 years ago

Could you provide us with the log from the stowaway pod? It's running in the Gefyra namespace. Are you running the k3s cluster locally? Also on ARM?

ventsislav-georgiev commented 2 years ago

@SteinRobert yes, I've specified the versions in Kubernetes Setup and OS version sections. This is the log from the stowaway:

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] 01-envfile: executing...
[cont-init.d] 01-envfile: exited 0.
[cont-init.d] 01-migrations: executing...
[migrations] started
[migrations] no migrations found
[cont-init.d] 01-migrations: exited 0.
[cont-init.d] 02-tamper-check: executing...
[cont-init.d] 02-tamper-check: exited 0.
[cont-init.d] 10-adduser: executing...

-------------------------------------
          _         ()
         | |  ___   _    __
         | | / __| | |  /  \
         | | \__ \ | | | () |
         |_| |___/ |_|  \__/

Brought to you by linuxserver.io
-------------------------------------

To support the app dev(s) visit:
WireGuard: https://www.wireguard.com/donations/

To support LSIO projects visit:
https://www.linuxserver.io/donate/
-------------------------------------
GID/UID
-------------------------------------

User uid:    1000
User gid:    1000
-------------------------------------

[cont-init.d] 10-adduser: exited 0.
[cont-init.d] 40-confs: executing...
**** Server mode is selected ****
**** SERVERURL var is either not set or is set to "auto", setting external IP to auto detected value of 46.10.222.23 ****
**** External server port is set to 31820. Make sure that port is properly forwarded to port 51820 inside this container ****
**** Internal subnet is set to 192.168.99.0 ****
**** AllowedIPs for peers 0.0.0.0/0, ::/0 ****
**** PEERDNS var is either not set or is set to "auto", setting peer DNS to 192.168.99.1 to use wireguard docker host's DNS. ****
**** No wg0.conf found (maybe an initial install), generating 1 server and 1 peer/client confs ****
grep: /config/peer*/*.conf: No such file or directory
Adding 172.18.0.0/16 to wg0.conf's AllowedIPs for peer 1
PEER 1 QR code:
█████████████████████████████████████████████████████████████████
█████████████████████████████████████████████████████████████████
████ ▄▄▄▄▄ █  █▀█ █▀█▀▀ ▄██▀ █▀▀▀██▄▀████▀ ▄██▀▄▄▀▄ ██ ▄▄▄▄▄ ████
████ █   █ █    ▄ ██▀▀▀▄██▀ ▄█▄▄ ▄▄▀▀▀▀ ▀ █▀██▄ ▄▀▄ ██ █   █ ████
████ █▄▄▄█ █▀▀▄▀█ ▀▀█▄▄▀█▄▄█   ▄▄▄ ▄█▄ ▀ █▀▀▀█ ▀▀▀▀▄██ █▄▄▄█ ████
████▄▄▄▄▄▄▄█▄█▄█ ▀▄▀ ▀▄▀ █ ▀▄█ █▄█ █▄█▄▀▄▀ █ ▀▄█▄█▄█ █▄▄▄▄▄▄▄████
████▄▄██ ▄▄▀█ ▄█▀▄█▀▀ ▀▄███  ▄▄ ▄▄▄   ▄█ █▄  █ ▄ ▀ █▄██▄▀▄▄  ████
████▀█▀██▀▄▀▄▄▀▀▀▄ ▀ ▄█ ▀▀█▀█▄▄ ▄ ▀▄    ▄ ▀ ▀ ███ ▄  ▀▄▀▄█▄█▄████
████▄███▄ ▄██▀▀▄▄▀▀█▄▀ ▀█▄▄█  ██▄▄██▀██▀▀█▀ ▄▄▀▄▀▀▄█ ▀▄▄▀█▄▄ ████
█████ ▀  █▄▄  ▀▄▄ ▄ ▄ ▄▄  █▄██▀▀▀▄▀▄▄▄▄ ▄█ ▀█▄ ▄█▄ █▀▀█ ▄▄▀▀▄████
████▄▄ █ ▀▄ ▀ ▀▀▄ ██▀██ ▄█▀█▄▄ ▀▀█    ▀ ▀▀▄▀ ▄▄█ ▀ ▀▄▀▀██▀ ▀▄████
████▀ ██▄█▄▀▀▄▄█▄▀█▄█▄ █▀ ▄▀▀ █ █▄▄▄▀▄▀▀▄██▀   ▀ ▄ ██▀█▄▀█▀▄▀████
█████   ▀▄▄▀▄ ▀█ ▄ ▀▀ █▀ █ █▄▄▄▀▄█▀▄ ▄▀▀██  █▄ █  ▀█▀▄▀ ▄█ █▀████
████ ▀█▀█▀▄███▄▄█▄▄▄█ █▄██▀ ▄█   ▄ ▄▄▀▄▀▄█▀█ ▄▄▄█ ██▀█   ▄▀ █████
█████▄▄ ▀▄▄█▀ ▀ ▄█ ▄▀ █▀██▄▄ ▄ █▀▄▄▄███▄▀█ ▀▀▄█ ▄███▄▀▄ ▄▄██ ████
█████▀█▀ ▄▄▄ ▄▀ ▀▄ ▀▄▀ ▄█▄ ███ ▄▄▄ █▄ ▄█▄▄█ ▀ ██  ▀▄ ▄▄▄ ██ ▄████
████▄██  █▄█ ▄▀█▀ ██▄▄▀█ ▀███  █▄█ █ ▀▄▀▀▀▀▀▄█ ▀▀ █▄ █▄█ ██▄▀████
████▄▀█▀ ▄▄  ▀█ ▀█   ▀ ▀█▄▀█▄▀▄▄ ▄▄▄▀▄█ ▀▀ ▀ █ █▀▄█▀ ▄  ▄▄▀▀▄████
████  ███▄▄ ▄ ▀  ▀██▀█  ▄▄▄▀█▄▄▀▄▄▀▄█▀▄█▀▀ █   ▀▄▄███▄▀ ▄██  ████
████▄█▀ █▀▄▀ ███ ▀ ▄▄ █▀▄▄▀  ▀▄▀▄ █▄█▄ █▄▀▄█ ▄ ▄ ▄▀ █▀ █ ██▄ ████
████ ▄▀  ▄▄█▀▀ ▀▀ ▀▄█  ▄▄▀▄█ ▄███▄█▀▄▄█▄▀▄▄▀ ▄▄▀▄█▀▀    █ ▀▄▀████
████ ▄ ██▄▄▄▀█ ███ ▀ ▀█▀ ▄▀▄▀▀▀▄▄ ▄▄▄▄▄█▄▄▀▄▀ ██▀▄▄ █▀ ▄█ ▀██████
████▀█ ▀██▄ ▄ ██▀▄▄▄ █▄ █▄▀█▀█▀ ▀ ▄  ███ ▀ ▀▀███ ▀▀▄█▄ ▀█▀▄▄▄████
████▄▄▄   ▄█▄▄▀▀▀▄ █ ▄█▄█▀█▀▀██ ▄ █▄   ██▄█▄██▀██▄  ▀▄▀▀▄▄▄▀▄████
████▀█  ██▄██  ▄▄▄▀▄▄▀ ▄▀▄▄█ ██▀█  █▀ █ ▀▄█▀█▄  ▀▀█▄█▄█▀  █▀ ████
████ ▀ ▀▀▄▄▄▀▀▄  ▀▄▀ ▀▀▀▀▀▀▄█ █▄▀▄▀██▄▄▀▄ ▀▀▄  ▄▀▄██▀▄█ █▄▀ █████
██████████▄▄▀▄ ▄▄▀▄█ ▄▄  █ █ ▄ ▄▄▄   █ ██▄▀█ ▄▄▀██▄▀ ▄▄▄ ▀▀▀█████
████ ▄▄▄▄▄ █▀▀ ▀▀ ▄▄█ █▄ █▄▄██ █▄█ ▄█▄ ▀ █ ▄▀ ▀    ▀ █▄█  █ ▄████
████ █   █ █▄▄▄█▄▀ ▀▀█ ▀ ▄ ▀▄▀▄▄ ▄▄ ▄ █▀▀█ ██  ▀ ▀▀▀▄  ▄ █▀▀▀████
████ █▄▄▄█ █▀█▀▀ ▄▀▄█▄▄▄██ ▀ ▀▄ ▀█ █▄ ▄█▄▄█▄ ▀█▄██▀▄ ▄▀▀  ▀██████
████▄▄▄▄▄▄▄█▄█▄█▄▄█▄██▄███▄▄█▄▄▄▄█▄▄█▄▄▄▄▄█▄██▄█▄██▄▄▄█▄█▄▄█▄████
█████████████████████████████████████████████████████████████████
█████████████████████████████████████████████████████████████████
[cont-init.d] 40-confs: exited 0.
[cont-init.d] 90-custom-folders: executing...
[cont-init.d] 90-custom-folders: exited 0.
[cont-init.d] 99-custom-scripts: executing...
[custom-init] no custom files found exiting...
[cont-init.d] 99-custom-scripts: exited 0.
[cont-init.d] done.
[services.d] starting services
[services.d] done.
2022/10/20 13:17:38 [notice] 388#388: using the "epoll" event method
2022/10/20 13:17:38 [notice] 388#388: nginx/1.20.2
2022/10/20 13:17:38 [notice] 388#388: built by gcc 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04)
2022/10/20 13:17:38 [notice] 388#388: OS: Linux 5.15.68-0-virt
2022/10/20 13:17:38 [notice] 388#388: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2022/10/20 13:17:38 [notice] 388#388: start worker processes
2022/10/20 13:17:38 [notice] 388#388: start worker process 414
2022/10/20 13:17:38 [notice] 388#388: start worker process 415
2022/10/20 13:17:38 [notice] 388#388: start worker process 416
2022/10/20 13:17:38 [notice] 388#388: start worker process 418
2022/10/20 13:17:38 [notice] 388#388: start worker process 420
2022/10/20 13:17:38 [notice] 388#388: start worker process 421
2022/10/20 13:17:38 [notice] 388#388: start worker process 422
2022/10/20 13:17:38 [notice] 388#388: start worker process 423
[#] ip link add wg0 type wireguard
[#] wg setconf wg0 /dev/fd/63
[#] ip -4 address add 192.168.99.1 dev wg0
[#] ip link set mtu 1420 up dev wg0
[#] ip -4 route add 192.168.99.2/32 dev wg0
[#] ip -4 route add 172.18.0.0/16 dev wg0
[#] iptables -A FORWARD -i wg0 -j ACCEPT; iptables -A FORWARD -o wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
.:53
CoreDNS-1.10.0
linux/arm64, go1.19.1, 596a9f9
SteinRobert commented 2 years ago

@ventsislav-georgiev everything here looks pretty normal. Could you provide us with the complete operator log? Sorry for the inconvenience. I tried to reproduce this with an ARM setup - it also failed the first time - however the operator retries to copy the file multiple times, after the 2nd try everything works fine.

SteinRobert commented 2 years ago

We've been able to reproduce this issue. Actually it's a setup issue + we should probably add some more convenience checks/logs in Gefyra.

Your colima VM needs a network address:

colima start --kubernetes --network-address

This networks address is basically also the endpoint for your Kubernetes cluster:

colima list
PROFILE    STATUS     ARCH      CPUS    MEMORY    DISK     RUNTIME       ADDRESS
default    Running    x86_64    2       2GiB      60GiB    docker+k3s    192.168.106.2

Running gefyra up with this endpoint works as expected:

gefyra up --endpoint=192.168.106.2:31820

This should resolve the connection issue. However, from my point of view there are 2 things we could adapt to make this process more smooth:

  1. Actually we have a probe whether the Wireguard connection works (ping). For some reason this exits with 0 which indicates that the connection is working even though it is not. We need to investigate what exactly is happening here.
  2. We try to detect the endpoint automatically (if not provided). This does not seem to work with colima as it does with docker. We're spinning up a container in this case https://github.com/gefyrahq/gefyra/blob/main/client/gefyra/configuration.py#L109 to find out the correct IP address. Colima only provides us with a local address which is wrong in this case. This could be solved by adding a --colima flag to know whether K8s is run within colima (since the docker context is kind of not related to where K8s runs).

Plus we should probably add this to our Getting Started guides in the documentation to help people using Gefyra when working with Colima.

@ventsislav-georgiev would be great if you could try out the given commands. If everything works, please let us know.

ventsislav-georgiev commented 2 years ago

Thanks! I will be able to test it next week.

SteinRobert commented 2 years ago

Was just wondering - @ventsislav-georgiev did you have the time to check out my comment? :)

ventsislav-georgiev commented 2 years ago

Hey @SteinRobert, unfortunately I was not able to test it the way you described. I've already upgraded to macOS Ventura and colima has issues with exposing network address on it. ref: https://github.com/abiosoft/colima/issues/457

SteinRobert commented 2 years ago

Oh no this is kind of unfortunate. Thank you for the quick update, though.

As long as we cannot connect to the node through the --endpoint Gefyra can't work its magic. It is absolutely necessary to reroute the network traffic accordingly.

Let's keep an eye on this.

ventsislav-georgiev commented 2 years ago

Isn't it possible to make it work with 127.0.0.1? Since colima automatically port-forwards anything from the VM on the host. If Gefyra creates a NodePort service in the cluster, the port will be open on localhost on the host.

SteinRobert commented 2 years ago

Hey I'll talk to my colleague on this matter later - however from my understanding it is not possible to resolve 127.0.0.1 from inside the cargo container to your machine's localhost. I'll get back to you on this though.

ventsislav-georgiev commented 2 years ago

Calling the host from the cluster is done in colima via the 192.168.5.15 IP. ref: https://github.com/lima-vm/lima/blob/master/docs/network.md#host-ip-19216852

abiosoft commented 2 years ago

Hey @SteinRobert, unfortunately I was not able to test it the way you described. I've already upgraded to macOS Ventura and colima has issues with exposing network address on it. ref: abiosoft/colima#457

This should be resolved now.

SteinRobert commented 2 years ago

@abiosoft thank you for fixing this and letting us know!