GoogleCloudPlatform / k8s-config-connector

GCP Config Connector, a Kubernetes add-on for managing GCP resources
https://cloud.google.com/config-connector/docs/overview
Apache License 2.0
890 stars 218 forks source link

Config Connector on other K8s distributions - Error getting credentials using GOOGLE_APPLICATION_CREDENTIALS #573

Closed suckowbiz closed 2 years ago

suckowbiz commented 2 years ago

Checklist

Bug Description

Hello together,

I am working for a big company that evaluates Google Config Connector for production use. We have an inhouse kubernetes cluster that is used for our customers.

Since we do not use GKE I had to install the Config Connector as for "other Kubernetes distributions". I followed the offical guide.

When I verified the installation I came across a CrashLoopBackOff pod named cnrm-controller-manager-0.

#$ k get pods
NAME                                   READY   STATUS             RESTARTS   AGE
cnrm-controller-manager-0              0/1     CrashLoopBackOff   7          14m
cnrm-deletiondefender-0                1/1     Running            0          28m
cnrm-webhook-manager-7fc6f759c-2gxk2   1/1     Running            0          28m
cnrm-webhook-manager-7fc6f759c-f4rp7   1/1     Running            0          28m

Checking the logs:

# $ k logs cnrm-controller-manager-0
{"severity":"info","timestamp":"2021-11-17T13:58:45.112Z","msg":"Creating the manager"}
I1117 13:58:46.234679       1 request.go:668] Waited for 1.044441157s due to client-side throttling, not priority and fairness, request: GET:https://{obfuscated}/apis/gameservices.cnrm.cloud.google.com/v1beta1?timeout=32s
{"severity":"info","timestamp":"2021-11-17T13:58:48.138Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"severity":"error","timestamp":"2021-11-17T13:58:48.153Z","msg":"error creating the manager","error":"error creating manager: error creating TF provider: error configuring provider: [{0 Attempted to load application default credentials since neither `credentials` nor `access_token` was set in the provider block.  No credentials loaded. To use your gcloud credentials, run 'gcloud auth application-default login'.  Original error: google: error getting credentials using GOOGLE_APPLICATION_CREDENTIALS environment variable: invalid character '\\'' looking for beginning of object key string  []}]"}

The error states that the TF provider cannot be created because the credentials cannot be loaded. I guess that an environment variable is not in place where it should be.

Additional Diagnostic Information

The relevant secret config-connector-sa is in place as required by the guide. It contains a valid GCP service account:

# k get secrets
NAME                                        TYPE                                  DATA   AGE
cnrm-controller-manager-token-d7npx         kubernetes.io/service-account-token   3      36m
cnrm-deletiondefender-token-7xvmb           kubernetes.io/service-account-token   3      36m
cnrm-resource-stats-recorder-token-5t8c8    kubernetes.io/service-account-token   3      36m
cnrm-webhook-cert-abandon-on-uninstall      Opaque                                4      36m
cnrm-webhook-cert-cnrm-validating-webhook   Opaque                                4      36m
cnrm-webhook-manager-token-8tnz2            kubernetes.io/service-account-token   3      36m
config-connector-sa                         Opaque                                1      37m
default-token-smmk5                         kubernetes.io/service-account-token   3      37m

Kubernetes Cluster Version

# $ kubectl version --short
Client Version: v1.22.3
Server Version: v1.21.5

Config Connector Version

# $ kubectl get ns cnrm-system -o jsonpath='{.metadata.annotations.cnrm\.cloud\.google\.com/version}'
1.67.0

Config Connector Mode

# $ kubectl get ConfigConnector "configconnector.core.cnrm.cloud.google.com" -o=jsonpath="{@.spec.mode}"
cluster

Log Output

# $ k logs cnrm-controller-manager-0
{"severity":"info","timestamp":"2021-11-17T13:58:45.112Z","msg":"Creating the manager"}
I1117 13:58:46.234679       1 request.go:668] Waited for 1.044441157s due to client-side throttling, not priority and fairness, request: GET:https://{obfuscated}}:443/apis/gameservices.cnrm.cloud.google.com/v1beta1?timeout=32s
{"severity":"info","timestamp":"2021-11-17T13:58:48.138Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"severity":"error","timestamp":"2021-11-17T13:58:48.153Z","msg":"error creating the manager","error":"error creating manager: error creating TF provider: error configuring provider: [{0 Attempted to load application default credentials since neither `credentials` nor `access_token` was set in the provider block.  No credentials loaded. To use your gcloud credentials, run 'gcloud auth application-default login'.  Original error: google: error getting credentials using GOOGLE_APPLICATION_CREDENTIALS environment variable: invalid character '\\'' looking for beginning of object key string  []}]"}

Steps to Reproduce

Steps to reproduce the issue

Follow the official guide to install Google Config Connector to a custom Kubernetes Cluster.

YAML snippets

I used Ansible to install the Operator into an inhouse Kubernetes cluster that our company has developed as a standard product. The variable parts of the guide are filled with the yaml below:

---
apiVersion: v1
kind: Namespace
metadata:
  name: "cnrm-system"
---
apiVersion: v1
kind: Namespace
metadata:
  name: "config-connector"
  annotations:
    cnrm.cloud.google.com/organization-id: "{{config_connector.org_id}}"
---
apiVersion: v1
kind: Secret
metadata:
  name: config-connector-sa
  namespace: cnrm-system
type: Opaque
data:
  key.json: {{config_connector.service_account_json | b64encode}}
---
apiVersion: core.cnrm.cloud.google.com/v1beta1
kind: ConfigConnector
metadata:
  # the name is restricted to ensure that there is only ConfigConnector
  # instance installed in your cluster
  name: configconnector.core.cnrm.cloud.google.com
spec:
 mode: cluster
 credentialSecretName: config-connector-sa

I stumbled over an old issue that seems to be similar. But it is resolved long time ago: https://github.com/GoogleCloudPlatform/k8s-config-connector/issues/151

Additional question:

What is the difference between the Config Connector manifest that one has to download for the offical guide via gsutil cp gs://configconnector-operator/latest/release-bundle.tar.gz release-bundle.tar.gz and the ones that are provided in this Git repo (https://github.com/GoogleCloudPlatform/k8s-config-connector/tree/master/install-bundles)? (I did a diff on both. The are quite different).

xiaobaitusi commented 2 years ago

Hi @suckowbiz, thanks for exploring Config Connector!

I used Ansible to install the Operator into an inhouse Kubernetes cluster that our company has developed as a standard product.

I'm not an expert of Ansible. Just want to verify, the secret generated from the following yaml snippet needs to be in the same format as the one generated by following the official guideline.

apiVersion: v1
kind: Secret
metadata:
  name: config-connector-sa
  namespace: cnrm-system
type: Opaque
data:
  key.json: {{config_connector.service_account_json | b64encode}}

Per https://cloud.google.com/config-connector/docs/how-to/install-other-kubernetes#creating_a_service_account

# Create a service account key and export its credentials to a file named key.json:

gcloud iam service-accounts keys create --iam-account \
    SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com key.json

# Import the key's credentials as a Secret.
kubectl create secret generic SECRET_NAME \
    --from-file key.json \
    --namespace cnrm-system

Can you run kubectl get pod cnrm-controller-manager-0 -n cnrm-system -oyaml and post the output here? I want to check if the controller manager pod has spun up correctly with the secret volume mounted.

What is the difference between the Config Connector manifest that one has to download for the offical guide via gsutil cp gs://configconnector-operator/latest/release-bundle.tar.gz release-bundle.tar.gz and the ones that are provided in this Git repo (https://github.com/GoogleCloudPlatform/k8s-config-connector/tree/master/install-bundles)?

The manifest from gs://configconnector-operator/ is for Config Connector operator (maybe a better name is Config Connector installer) which is used to install/uninstall KCC declaratively. This is the officially supported approach to install/uninstall KCC now.

The manifest in https://github.com/GoogleCloudPlatform/k8s-config-connector/tree/master/install-bundles is raw Config Connector manifest that users used to leverage to install KCC manually. We now are considering stopping uploading the raw Config Connector manifest in github to avoid confusion.

suckowbiz commented 2 years ago

Hi @xiaobaitusi,

I can confirm the format of the secret is broken. The Config Connector Setup is fine.

After removing the secret and adding it manually via kubectl create secret generic config-connector-sa --from-file key.json --namespace cnrm-system the pod came up as expected:

# $ k get pods
NAME                                   READY   STATUS    RESTARTS   AGE
cnrm-controller-manager-0              1/1     Running   0          16s
cnrm-deletiondefender-0                1/1     Running   0          16h
cnrm-webhook-manager-7fc6f759c-2gxk2   1/1     Running   0          16h
cnrm-webhook-manager-7fc6f759c-f4rp7   1/1     Running   0          16h

The issue was that Ansible changed the double quotes " within the JSON into single quotes '. This is a known issue with Ansible and Jinja. In my case the JSON is encrypted in an Ansible vault and rendered into a Jinja template having defined ANSIBLE_JINJA2_NATIVE: "True". This causes that change. The simplest possible solution was to not have JSON in my vault but a datastructure. The datastructure can be rendered with {{ datastructure | to_json | b64encode }}.

Thanks!

suckowbiz commented 2 years ago

I am going to close that issue since the reason of the issue was analyzed and a solution suggested.