dictybase-docker / cluster-management

BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Forwarding Rules Fix #155

Open ktun95 opened 1 month ago

ktun95 commented 1 month ago

Bug description

When a cluster is created using kops version v1.27.0, and the kube config is subsequently exported with kops export kubeconfig in kops v1.27.1. The follow error is received:

W0808 18:00:14.284101   88876 create_kubecfg.go:69] Did not find API endpoint; may not be able to reach cluster

Any commands attempting to access the cluster fail because kubectl cannot resolve the exported address.

EXAMPLE: Attempting to run kops validate cluster results in:

Error: validation failed: unexpected error during validation: error listing nodes: Get "https://api.dicty-playground.k8s.local/api/v1/nodes": dial tcp: lookup api.dicty-playground.k8s.local on no such host


Use direnv to manage environmental variables


export KOPS_CLUSTER_NAME=dicty-playground.k8s.local
export KOPS_STATE_STORE=gs://kops-kubernetes-state-playground
export KUBECONFIG="${PWD}/.kube/config"
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/credentials/dcr-kube-admin-key.json

Install kops version v1.27.0

Use asdf for version management

> asdf install kops v1.27.0
> asdf local kops v1.27.0

Create Cluster with kops v1.27.0

> kops create cluster --zones us-central1-a

Running cluster update immediately after creation: "kops create cluster created the Cluster object & InstanceGroup object in our state store, but didn't actually create any instances or other cloud objects in GCE. To do that, we'll use kops update cluster." (https://kops.sigs.k8s.io/getting_started/gce/#creating-a-cluster)

> kops update cluster --yes --admin

note: The --admin flag exports an admin user credential to the kubeconfig and adds it to the cluster context.

Upgrading kops v1.27.1

> asdf install kops v1.27.1
> asdf local kops v1.27.1
> kops version
Client version: 1.27.1 (git-v1.27.1)

Re-export Kubeconfig

> kops export kubeconfig --admin

I0808 20:24:48.139586  118795 gce_cloud.go:132] Will load GOOGLE_APPLICATION_CREDENTIALS from /home/faceless/Projects/playground-cluster-access/credentials/dcr-kube-admin-key.json
W0808 20:24:49.364651  118795 create_kubecfg.go:69] Did not find API endpoint; may not be able to reach cluster
kOps has set your kubectl context to dicty-playground.k8s.local

Attempting to Access Cluster

> kops validate cluster

Error: validation failed: unexpected error during validation: error listing nodes: Get "https://api.dicty-playground.k8s.local/api/v1/nodes": dial tcp: lookup api.dicty-playground.k8s.local on no such host


NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-us-central1-a     ControlPlane    e2-medium       1       1       us-central1
nodes-us-central1-a             Node            e2-medium       1       1       us-central1

NAME                                    ROLE            READY
control-plane-us-central1-a-n2qx        control-plane   True
nodes-us-central1-a-jwhg                node            True

Your cluster dicty-playground.k8s.local is ready

Solution 1 (Recommended)

Simply running kops update cluster --yes on kops v1.27.1 should add the necessary label to our forwarding rules:

kops v1.27.1
> kops update cluster --yes

The appropriate label is now added to the Forwarding Rule

> gcloud compute forwarding-rules describe api-dicty-playground-k8s-local

>> Select the region of the cluster ([33] region: us-central1)

IPProtocol: TCP
creationTimestamp: '2024-08-08T16:59:30.089-07:00'
description: ''
fingerprint: 09jb6N-YOtc=
id: '6825895018706298125'
kind: compute#forwardingRule
labelFingerprint: tBBwcrW2dDE=
  k8s-io-cluster-name: dicty-playground-k8s-local
  name: api
loadBalancingScheme: EXTERNAL
name: api-dicty-playground-k8s-local
networkTier: PREMIUM
portRange: 443-443
region: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1
selfLink: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1/forwardingRules/api-dicty-playground-k8s-local
target: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1/targetPools/api-dicty-playground-k8s-local
> kops validate cluster

I0809 11:00:57.864014  172353 gce_cloud.go:132] Will load GOOGLE_APPLICATION_CREDENTIALS from /home/faceless/Projects/playground-cluster-access/credentials/dcr-kube-admin-key.json
Validating cluster dicty-playground.k8s.local

I0809 11:01:00.320775  172353 gce_cloud.go:301] Scanning zones: [us-central1-c us-central1-a us-central1-f us-central1-b]
control-plane-us-central1-a ControlPlane    e2-medium   1   1   us-central1
nodes-us-central1-a     Node        e2-medium   1   1   us-central1

NAME                    ROLE        READY
control-plane-us-central1-a-n2qx    control-plane   True
nodes-us-central1-a-jwhg        node        True

Solution 2

As described below, the solution involves adding a label for the Forwarding Rule associated with the Cluster.

> gcloud compute forwarding-rules update api-dicty-pla
yground-k8s-local --update-labels=k8s-io-cluster-name=dicty-playground-k8s-local

>> Select the region of the cluster ([33] region: us-central1)

The command above creates a label for the forwarding rule with the key k8s-io-cluster-name and a value derived from the cluster name dicty-playground-k8s-local


Check the forwarding rule associated with the cluster. Notice that the label I created is present in the Forwarding Rule

> gcloud compute forwarding-rules describe api-dicty-playground-k8s-local

>> Select the region of the cluster ([33] region: us-central1)

IPProtocol: TCP
creationTimestamp: '2024-08-08T16:59:30.089-07:00'
description: ''
fingerprint: 09jb6N-YOtc=
id: '6825895018706298125'
kind: compute#forwardingRule
labelFingerprint: 3nhRTTS9zug=

  k8s-io-cluster-name: dicty-playground-k8s-local
loadBalancingScheme: EXTERNAL
name: api-dicty-playground-k8s-local
networkTier: PREMIUM
portRange: 443-443
region: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1
selfLink: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1/forwardingRules/api-dicty-playground-k8s-local
target: https://www.googleapis.com/compute/v1/projects/solid-topic-90123/regions/us-central1/targetPools/api-dicty-playground-k8s-local

Re-export Kubeconfig

kops v1.27.1
> kops export kubeconfig --admin
I0809 09:55:43.085248  122429 gce_cloud.go:132] Will load GOOGLE_APPLICATION_CREDENTIALS from /home/faceless/Projects/playground-cluster-access/credentials/dcr-kube-admin-key.json
kOps has set your kubectl context to dicty-playground.k8s.local

Accessing the Cluster

> kops validate cluster

I0809 09:56:31.607036  122817 gce_cloud.go:132] Will load GOOGLE_APPLICATION_CREDENTIALS from /home/faceless/Projects/playground-cluster-access/credentials/dcr-kube-admin-key.json
Validating cluster dicty-playground.k8s.local

I0809 09:56:33.428704  122817 gce_cloud.go:301] Scanning zones: [us-central1-c us-central1-a us-central1-f us-central1-b]
NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-us-central1-a     ControlPlane    e2-medium       1       1       us-central1
nodes-us-central1-a             Node            e2-medium       1       1       us-central1

NAME                                    ROLE            READY
control-plane-us-central1-a-n2qx        control-plane   True
nodes-us-central1-a-jwhg                node            True

Your cluster dicty-playground.k8s.local is ready
> kubectl get nodes

NAME                               STATUS   ROLES           AGE   VERSION
control-plane-us-central1-a-n2qx   Ready    control-plane   15h   v1.27.15
nodes-us-central1-a-jwhg           Ready    node            15h   v1.27.15

Now we can successfully access the cluster.

Issue Summary

In version v1.27.0, The IP address exported to our kubeconfig comes from a Forwarding Rule resource in Google Cloud. When running,

kops export kubeconfig

There is a function that

  1. Gets the Forwarding Rules for the project.
  2. If it finds an Forwarding Rule with an IPAddress property, and adds that IPAddress to a list.
  3. The first IPAddress in that list is exported to our kubeconfig as the server property.
  4. My local machine is able to connect to the cluster using this IPAddress. In versions v1.27.1 (and beyond, I assume), For
    kops export kubeconfig

    they added a check in the function that gets the Forwarding Rules. This check looks for a labels property for each Forwarding Rule listed. (https://cloud.google.com/compute/docs/labeling-resources)

  5. Specifically, it checks to see if the Forwarding Rule has a label with the key, k8s-io-cluster-name.
  6. If the value of the k8s-io-cluster-name label matches the cluster name, The IPAddress is added to the list as normal.
  7. Otherwise, the function does not add it to the list.
  8. Since we only have 1 Forwarding Rule for the project and it does not have the k8s-io-cluster-name property, the list is empty.
  9. The result is that the server property exported in the kubeconfig becomes https://api.dictycr-dev.k8s.local, which my local machine does not know how to resolve.

Side Issue

There is a bug that is somewhat related to the primary issue above. It caused me some confusion in trying to reproduce the primary issue.

When we have multiple clusters with their own associated Forwarding Rules that were creating using kops v1.27.0, If we run kops export kubeconfig, for a single cluster, kops will get the IP Addresses for multiple clusters, and possibly set the server for our kubeconfig to the IP Address of the wrong cluster.


This issue is the reason for the need to add a label to our cluster forwarding rules in the first place:

So this should be fixed by https://github.com/kubernetes/kops/pull/15709, which will be in 1.28, and by https://github.com/kubernetes/kops/pull/15831 which is a (proposed) cherry-pick to 1.27

forwardingRules now support labels, so we will add a label to the forwarding rules to mark the cluster. This is much more robust than the name-based approach we had to use before.

This does unfortunately mean that kops export kubecfg won't work with a cluster created with an earlier version of kOps, until you do kops update cluster with the new kops version. Hopefully that is tolerable.


