gefyrahq / gefyra-docker-desktop-extension

Docker Desktop Extension for Gefyra. Connect your local containers to Kubernetes!
https://gefyra.dev
Apache License 2.0
3 stars 1 forks source link

Could not install Gefyra #173

Open mbonaci opened 1 year ago

mbonaci commented 1 year ago

I'm trying to run this on Windows 11 (with WSL2). It reads my kubeconfig correctly, connects to the cluster and I can choose a namespace, image and a pod to copy the env from:

image

The extension reports:

And that's where it ends:

image

I wasn't able to find any errors in the docker extensions log, but I'm not sure I was looking in the right place.
If you specify the log file location (on Windows 11) I can provide more info.

Thank you.

Schille commented 1 year ago

Hi @mbonaci Thank you for reporting this issue. We're going to look into it. What's your Docker Desktop version? Are you using the built-in Kubernetes? - or any other option?

mbonaci commented 1 year ago

@Schille thanks for the quick response. I'm not using the built-in k8s, but a k8s cluster (through a VPN).

kubectl versions:

Client version: v1.24.13 (WSL2 Ubuntu-20.04)
Server version: v1.23.15

Docker Desktop version: image

I probably should've mentioned this in the initial issue, but I forgot, I only ever use kubectl from my WSL, not the one in Windows that comes with Docker Desktop.

Although both, the kubectl from WSL and the one from Windows can access the cluster and successfully run e.g. kubectl get pods, the one on Windows is a newer version, outside of allowable version skew, so maybe that's what causing the issue:

> kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.25.4
Kustomize Version: v4.5.7
Server Version: v1.23.15
WARNING: version difference between client (1.25) and server (1.23) exceeds the supported minor version skew of +/-1
mbonaci commented 1 year ago

Just to mention that my colleagues reported they were able run Gefyra from the command line to debug a Java server app running within our cluster. E.g.:

gefyra up --host 10.4.35.248 # control-plane,master
gefyra run -i st/web-api:latest -N web-api --env-from pod/web-api-7c7d5dd4df-zmhsr/web-api --expose 5006:5006 --rm --env TINI_SUBREAPER=true
Schille commented 1 year ago

Thank you for the follow-up. Well, a VPN in WSL2 seems a bit like a challenge, though it should work, too. Just one simple question. Did you set the IP in the initial screen under advanced cluster settings? image Indeed, the connection is initiated from within WSL2, but only if you run Docker Desktop with the WSL2 backend. Gefyra's extension copies a Windows executable on the host. But, the connection is established from a Wireguard endpoint in your local's Docker network. I am not sure if that network (look for gefyra in docker_ network ls) is part of your VPN setup. I'd be interested in looking deeper into your setup. If you need help it's probably a better idea to jump on a short call to investigate the issue properly.

mbonaci commented 1 year ago

Hi @Schille, I did try entering the IP in Advanced Cluster Settings to see whether that would fix this issue, but that did not help, so I just kept it empty.

Docker WSL2 backend ✔️ image

Gefyra docker network ✔️

$ docker network ls
...
4160cdb57cda   gefyra               bridge    local
...

I'm fine with just working from the WSL2 command line, but I could do a call.

Schille commented 1 year ago

Hi @mbonaci. Well, it seems there is a regression in Gefyra's lib concerning WSL2 (or it never really worked at all). Gefyra uses Wireguard to establish a secure VPN connection into the cluster. The default WSL2 kernel is pre-built by Microsoft and they disabled an important feature (Netfilter Conntrack, see: https://github.com/microsoft/WSL/issues/8149). That's a pity. The good news is, I found a workaround that would enable at least wireguard-go (that Gefyra employs) to run on WSL2 without compiling a personal build of the Linux kernel for WSL2.

Long story short: there will be a release of the Gefyra CLI soon that should run on WSL2. By the way, the Gefyra CLI for Windows should work nonetheless. The Docker Desktop extension of Gefyra is not affected by this issue since it is running the Windows built of Gefyra's lib.

mbonaci commented 1 year ago

Hi @Schille, thanks for the info. If you ping me here after that release I'd gladly try it out and provide feedback.

SteinRobert commented 1 year ago

The latest release includes some fixes which (should) resolve this one. @mbonaci we'd be super happy to hear your feedback on this. Thank you so much for making us aware of the issue in the first place. We're looking forward to your input on this!

mbonaci commented 1 year ago

Hi @SteinRobert, unfortunately the same issue is still present in my case: image

As I mentioned before, everything goes fine until this step. I'm able to choose the context, the namespace, copy env from, the image... everything gets populated correctly.

SteinRobert commented 1 year ago

By any chance - we've seen similar behavior due to some weird Docker configuration (in ~/.docker/config.json). Could you please check your credhelper / credshelper key? I found this answer on Stackexchange was actually helping.

However, this is more or less guessing. We're working on making the actual errors that occur more visible and helpful.

mbonaci commented 1 year ago

I don't heave any of those keys in that file.

We're working on making the actual errors that occur more visible and helpful.

You can notify me when that happens and I'll retry.

SteinRobert commented 1 year ago

I am afraid this will take a few days or even weeks! In the meantime: Are you able to build a custom image on your machine? Just from the screenshot it looks like the cargo image cannot be build. When you create a simple Dockerfile:

FROM alpine
RUN ls

and run docker build . on that - the command works without any problems? Sorry for the back and forth. Just trying to help you so you can continue working with Gefyra. Thank you so much for your feedback!

mbonaci commented 1 year ago

I build new images fairly often (in my WSL terminal and in Docker UI), so that shouldn't be the problem here. I'm not blocked by this, since I can still used Gefyra on the command line (if such a need arises) so I can wait until that update happens, no worries.

SteinRobert commented 1 year ago

Okay, thank you!

SteinRobert commented 1 year ago

We released version 1.2.12 which now displays errors (if available) during the installation process.

mbonaci commented 1 year ago

Here's what it says:

Error: Credentials store error: StoreError('docker-credential-gcloud not installed or not available in PATH') - Couldn't install Gefyra.

I searched online and tried a few suggestions from SO and Docker forum.

The original version of Docker's config.json on WSL:

cat ~/.docker/config.json
{
  "auths": {
    "https://index.docker.io/v1/": {}
  },
  "credsStore": "desktop.exe"
}

The original version on Win:

{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud"
  },
  "credsStore": "desktop"
}

I tried renaming credsStore to credStore, first WSL, then in Win too. I tried removing credsStore, first WSL, then in Win too.

During all those attempts the error message did not change.

Then I renamed both files to config.json.bkp, which got me over that error, but then this happened:

Waiting for stowaway to become ready.
Could not confirm Stowaway - fatal error.

Then I decided to install Gefyra on WSL in order to see whether I have some higher level issue that's unrelated to Docker Desktop so I ran:

> gefyra up --host 10.4.35.248
[INFO] Installing Gefyra Operator
[INFO] Created network 'gefyra' (747e5f5fe28f)
[INFO] Container image "quay.io/gefyra/operator:1.1.1" already present on machine
[INFO] Pulling image "quay.io/gefyra/stowaway:1.1.1"
[INFO] Successfully pulled image "quay.io/gefyra/stowaway:1.1.1" in 2.63552402s
[INFO] Operator became ready in 10.5670 seconds
[INFO] Deploying Cargo (network sidecar) with IP 172.30.0.149

> gefyra down
[INFO] Removing running bridges
[INFO] Uninstalling Operator
[INFO] Removing Cargo
[INFO] Removing Docker network gefyra

So all good there, it seems.

Then I added the same, master node's IP to the first page of Gefyra extension's Advanced Cluster Settings and re-run it again and this time it seems I've gotten over this stowaway fatal error, but then it just froze on the following line for 10 minutes before I stopped it:

Cargo not found - starting Cargo now...

We're moving forward here and that's the important thing :)

SteinRobert commented 1 year ago

Thank you for the super detailed feedback! Great thing the error was actually displayed! I'll dive into the given error message later. Hopefully we'll manage to resolve that one as well. I'll get back to you asap.