dictybase-docker / cluster-management

BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Forwarding Rules Fix #155

Open ktun95 opened 1 month ago

ktun95 commented 1 month ago

Bug description

When a cluster is created using kops version v1.27.0, and the kube config is subsequently exported with kops export kubeconfig in kops v1.27.1. The follow error is received:

W0808 18:00:14.284101   88876 create_kubecfg.go:69] Did not find API endpoint; may not be able to reach cluster

Any commands attempting to access the cluster fail because kubectl cannot resolve the exported address.

EXAMPLE: Attempting to run kops validate cluster results in:

Error: validation failed: unexpected error during validation: error listing nodes: Get "https://api.dicty-playground.k8s.local/api/v1/nodes": dial tcp: lookup api.dicty-playground.k8s.local on 192.168.50.1:53: no such host

Reproduction

Use direnv to manage environmental variables

.envrc:

export KOPS_CLUSTER_NAME=dicty-playground.k8s.local
export KOPS_STATE_STORE=gs://kops-kubernetes-state-playground
export KUBECONFIG="${PWD}/.kube/config"
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/credentials/dcr-kube-admin-key.json

Install kops version v1.27.0

Use asdf for version management

> asdf install kops v1.27.0
> asdf local kops v1.27.0

Create Cluster with kops v1.27.0

> kops create cluster --zones us-central1-a

Running cluster update immediately after creation: "kops create cluster created the Cluster object & InstanceGroup object in our state store, but didn't actually create any instances or other cloud objects in GCE. To do that, we'll use kops update cluster." (https://kops.sigs.k8s.io/getting_started/gce/#creating-a-cluster)

> kops update cluster --yes --admin

note: The --admin flag exports an admin user credential to the kubeconfig and adds it to the cluster context.

Upgrading kops v1.27.1

> asdf install kops v1.27.1
> asdf local kops v1.27.1
> kops version
Client version: 1.27.1 (git-v1.27.1)

Re-export Kubeconfig

> kops export kubeconfig --admin

I0808 20:24:48.139586  118795 gce_cloud.go:132] Will load GOOGLE_APPLICATION_CREDENTIALS from /home/faceless/Projects/playground-cluster-access/credentials/dcr-kube-admin-key.json
W0808 20:24:49.364651  118795 create_kubecfg.go:69] Did not find API endpoint; may not be able to reach cluster
kOps has set your kubectl context to dicty-playground.k8s.local

Attempting to Access Cluster

> kops validate cluster

Error: validation failed: unexpected error during validation: error listing nodes: Get "https://api.dicty-playground.k8s.local/api/v1/nodes": dial tcp: lookup api.dicty-playground.k8s.local on 192.168.50.1:53: no such host

EXPECTED:

INSTANCE GROUPS
NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-us-central1-a     ControlPlane    e2-medium       1       1       us-central1
nodes-us-central1-a             Node            e2-medium       1       1       us-central1

NODE STATUS
NAME                                    ROLE            READY
control-plane-us-central1-a-n2qx        control-plane   True
nodes-us-central1-a-jwhg                node            True

Your cluster dicty-playground.k8s.local is ready

Solution 1 (Recommended)

Simply running kops update cluster --yes on kops v1.27.1 should add the necessary label to our forwarding rules:

kops v1.27.1
> kops update cluster --yes

The appropriate label is now added to the Forwarding Rule

> gcloud compute forwarding-rules describe api-dicty-playground-k8s-local

>> Select the region of the cluster ([33] region: us-central1)

IPAddress: 34.171.238.68
IPProtocol: TCP
creationTimestamp: '2024-08-08T16:59:30.089-07:00'
description: ''
fingerprint: 09jb6N-YOtc=
id: '6825895018706298125'
kind: compute#forwardingRule
labelFingerprint: tBBwcrW2dDE=
labels:
  k8s-io-cluster-name: dicty-playground-k8s-local
  name: api
loadBalancingScheme: EXTERNAL
name: api-dicty-playground-k8s-local
networkTier: PREMIUM
portRange: 443-443
region: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1
selfLink: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1/forwardingRules/api-dicty-playground-k8s-local
target: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1/targetPools/api-dicty-playground-k8s-local
> kops validate cluster

I0809 11:00:57.864014  172353 gce_cloud.go:132] Will load GOOGLE_APPLICATION_CREDENTIALS from /home/faceless/Projects/playground-cluster-access/credentials/dcr-kube-admin-key.json
Validating cluster dicty-playground.k8s.local

I0809 11:01:00.320775  172353 gce_cloud.go:301] Scanning zones: [us-central1-c us-central1-a us-central1-f us-central1-b]
INSTANCE GROUPS
NAME                ROLE        MACHINETYPE MIN MAX SUBNETS
control-plane-us-central1-a ControlPlane    e2-medium   1   1   us-central1
nodes-us-central1-a     Node        e2-medium   1   1   us-central1

NODE STATUS
NAME                    ROLE        READY
control-plane-us-central1-a-n2qx    control-plane   True
nodes-us-central1-a-jwhg        node        True

Solution 2

As described below, the solution involves adding a label for the Forwarding Rule associated with the Cluster.

> gcloud compute forwarding-rules update api-dicty-pla
yground-k8s-local --update-labels=k8s-io-cluster-name=dicty-playground-k8s-local

>> Select the region of the cluster ([33] region: us-central1)

The command above creates a label for the forwarding rule with the key k8s-io-cluster-name and a value derived from the cluster name dicty-playground-k8s-local

Verification

Check the forwarding rule associated with the cluster. Notice that the label I created is present in the Forwarding Rule

> gcloud compute forwarding-rules describe api-dicty-playground-k8s-local

>> Select the region of the cluster ([33] region: us-central1)

IPAddress: 34.171.238.68
IPProtocol: TCP
creationTimestamp: '2024-08-08T16:59:30.089-07:00'
description: ''
fingerprint: 09jb6N-YOtc=
id: '6825895018706298125'
kind: compute#forwardingRule
labelFingerprint: 3nhRTTS9zug=

labels:
  k8s-io-cluster-name: dicty-playground-k8s-local
^^^
loadBalancingScheme: EXTERNAL
name: api-dicty-playground-k8s-local
networkTier: PREMIUM
portRange: 443-443
region: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1
selfLink: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1/forwardingRules/api-dicty-playground-k8s-local
target: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1/targetPools/api-dicty-playground-k8s-local

Re-export Kubeconfig

kops v1.27.1
> kops export kubeconfig --admin
I0809 09:55:43.085248  122429 gce_cloud.go:132] Will load GOOGLE_APPLICATION_CREDENTIALS from /home/faceless/Projects/playground-cluster-access/credentials/dcr-kube-admin-key.json
kOps has set your kubectl context to dicty-playground.k8s.local

Accessing the Cluster

> kops validate cluster

I0809 09:56:31.607036  122817 gce_cloud.go:132] Will load GOOGLE_APPLICATION_CREDENTIALS from /home/faceless/Projects/playground-cluster-access/credentials/dcr-kube-admin-key.json
Validating cluster dicty-playground.k8s.local

I0809 09:56:33.428704  122817 gce_cloud.go:301] Scanning zones: [us-central1-c us-central1-a us-central1-f us-central1-b]
INSTANCE GROUPS
NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-us-central1-a     ControlPlane    e2-medium       1       1       us-central1
nodes-us-central1-a             Node            e2-medium       1       1       us-central1

NODE STATUS
NAME                                    ROLE            READY
control-plane-us-central1-a-n2qx        control-plane   True
nodes-us-central1-a-jwhg                node            True

Your cluster dicty-playground.k8s.local is ready
> kubectl get nodes

NAME                               STATUS   ROLES           AGE   VERSION
control-plane-us-central1-a-n2qx   Ready    control-plane   15h   v1.27.15
nodes-us-central1-a-jwhg           Ready    node            15h   v1.27.15

Now we can successfully access the cluster.

Issue Summary

In version v1.27.0, The IP address exported to our kubeconfig comes from a Forwarding Rule resource in Google Cloud. When running,

kops export kubeconfig

There is a function that

  1. Gets the Forwarding Rules for the project.
  2. If it finds an Forwarding Rule with an IPAddress property, and adds that IPAddress to a list.
  3. The first IPAddress in that list is exported to our kubeconfig as the server property.
  4. My local machine is able to connect to the cluster using this IPAddress. In versions v1.27.1 (and beyond, I assume), For
    kops export kubeconfig

    they added a check in the function that gets the Forwarding Rules. This check looks for a labels property for each Forwarding Rule listed. (https://cloud.google.com/compute/docs/labeling-resources)

  5. Specifically, it checks to see if the Forwarding Rule has a label with the key, k8s-io-cluster-name.
  6. If the value of the k8s-io-cluster-name label matches the cluster name, The IPAddress is added to the list as normal.
  7. Otherwise, the function does not add it to the list.
  8. Since we only have 1 Forwarding Rule for the project and it does not have the k8s-io-cluster-name property, the list is empty.
  9. The result is that the server property exported in the kubeconfig becomes https://api.dictycr-dev.k8s.local, which my local machine does not know how to resolve.

Side Issue

There is a bug that is somewhat related to the primary issue above. It caused me some confusion in trying to reproduce the primary issue.

When we have multiple clusters with their own associated Forwarding Rules that were creating using kops v1.27.0, If we run kops export kubeconfig, for a single cluster, kops will get the IP Addresses for multiple clusters, and possibly set the server for our kubeconfig to the IP Address of the wrong cluster.

https://github.com/kubernetes/kops/issues/15679

This issue is the reason for the need to add a label to our cluster forwarding rules in the first place:

So this should be fixed by https://github.com/kubernetes/kops/pull/15709, which will be in 1.28, and by https://github.com/kubernetes/kops/pull/15831 which is a (proposed) cherry-pick to 1.27

forwardingRules now support labels, so we will add a label to the forwarding rules to mark the cluster. This is much more robust than the name-based approach we had to use before.

This does unfortunately mean that kops export kubecfg won't work with a cluster created with an earlier version of kOps, until you do kops update cluster with the new kops version. Hopefully that is tolerable.

https://github.com/kubernetes/kops/issues/15679#issuecomment-1694407503

cybersiddhu commented 1 month ago

Bug Report: Kops Cluster Creation and Kubeconfig Export Issue

Bug Description

When creating a cluster using kops version v1.27.0 and subsequently exporting the kubeconfig with kops export kubeconfig in kops version v1.27.1, the following warning is received:

W0808 18:00:14.284101   88876 create_kubecfg.go:69] Did not find API endpoint; may not be able to reach cluster

As a result, any commands attempting to access the cluster fail because kubectl cannot resolve the exported address.

Example Error

Attempting to run kops validate cluster results in the following error:

Error: validation failed: unexpected error during validation: error listing nodes: Get "https://api.dicty-playground.k8s.local/api/v1/nodes": dial tcp: lookup api.dicty-playground.k8s.local on 192.168.50.1:53: no such host

Steps to Reproduce

Environment Setup

Use direnv to manage environmental variables. Create a .envrc file with the following content:

export KOPS_CLUSTER_NAME=dicty-playground.k8s.local
export KOPS_STATE_STORE=gs://kops-kubernetes-state-playground
export KUBECONFIG="${PWD}/.kube/config"
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/credentials/dcr-kube-admin-key.json

1. Install Kops Version v1.27.0

Use asdf for version management:

> asdf install kops v1.27.0
> asdf local kops v1.27.0

2. Create Cluster with Kops v1.27.0

Run the following command to create the cluster:

> kops create cluster --zones us-central1-a

Immediately after creation, run the cluster update:

> kops update cluster --yes --admin

Note: The --admin flag exports an admin user credential to the kubeconfig and adds it to the cluster context.

3. Upgrade to Kops v1.27.1

Install and set the local version to v1.27.1:

> asdf install kops v1.27.1
> asdf local kops v1.27.1
> kops version
Client version: 1.27.1 (git-v1.27.1)

4. Re-export Kubeconfig

Run the following command to export the kubeconfig:

> kops export kubeconfig --admin

The output includes:

I0808 20:24:48.139586  118795 gce_cloud.go:132] Will load GOOGLE_APPLICATION_CREDENTIALS from /home/faceless/Projects/playground-cluster-access/credentials/dcr-kube-admin-key.json
W0808 20:24:49.364651  118795 create_kubecfg.go:69] Did not find API endpoint; may not be able to reach cluster
kOps has set your kubectl context to dicty-playground.k8s.local

5. Attempt to Access Cluster

Run the following command to validate the cluster:

> kops validate cluster

The output returns the same error as before.

Expected Output

The expected output when running kops validate cluster should be:

INSTANCE GROUPS
NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-us-central1-a     ControlPlane    e2-medium       1       1       us-central1
nodes-us-central1-a             Node            e2-medium       1       1       us-central1

NODE STATUS
NAME                                    ROLE            READY
control-plane-us-central1-a-n2qx        control-plane   True
nodes-us-central1-a-jwhg                node            True

Your cluster dicty-playground.k8s.local is ready

Solutions

Solution 1 (Recommended)

Simply running kops update cluster --yes on kops v1.27.1 should add the necessary label to the forwarding rules:

> kops update cluster --yes

After running this command, the appropriate label is added to the Forwarding Rule. You can verify this with:

> gcloud compute forwarding-rules describe api-dicty-playground-k8s-local

You should see an output similar to:

IPAddress: 34.171.238.68
IPProtocol: TCP
creationTimestamp: '2024-08-08T16:59:30.089-07:00'
labels:
  k8s-io-cluster-name: dicty-playground-k8s-local

After updating, validate the cluster again:

> kops validate cluster

Solution 2

Alternatively, you can manually add a label to the forwarding rule associated with the cluster:

> gcloud compute forwarding-rules update api-dicty-playground-k8s-local --update-labels=k8s-io-cluster-name=dicty-playground-k8s-local

After updating the label, re-export the kubeconfig:

> kops export kubeconfig --admin

Then, validate the cluster again:

> kops validate cluster

Issue Summary

In version v1.27.0, the IP address exported to the kubeconfig comes from a Forwarding Rule resource in Google Cloud. The function that retrieves the Forwarding Rules checks for the presence of an IPAddress property and adds it to the list. However, in version v1.27.1 and beyond, a new check was introduced to ensure that the Forwarding Rule has a label with the key k8s-io-cluster-name. If the label is missing or does not match the cluster name, the IPAddress is not added to the list, resulting in an unresolved address in the kubeconfig.

Side Issue

A related bug occurs when multiple clusters with their own associated Forwarding Rules are created using kops v1.27.0. Running kops export kubeconfig for a single cluster may retrieve IP Addresses for multiple clusters, potentially setting the server for the kubeconfig to the IP Address of the wrong cluster.

This issue is addressed in upcoming versions of kops (1.28 and beyond) by introducing labels for forwarding rules, which will provide a more robust identification method than the previous name-based approach.

For further details, refer to the following GitHub issues: