Closed damadei-google closed 2 years ago
Sounds like a superset improvement to the one described in #3027, the connect gateway sounds interesting, thanks for sharing!
@laloyalo I'm trying to make it work on my end and then submit a PR. Hope to make it work.
Thanks @damadei-google! I will gladly review it and offer some help, if needed let me know!
@laloyalo
I've implemented what I believe is needed, however when I try to run the make test, I'm getting the following timeout. I've ran multiple times, same result and if I try to do this git command by hand it works. Any idea?
--- FAIL: TestVerifyCommitSignature (92.16s)
git_test.go:287:
Error Trace: git_test.go:287
Error: Received unexpected error:
git fetch origin --tags --force
failed timeout after 1m30s
Test: TestVerifyCommitSignature
git_test.go:293:
Error Trace: git_test.go:293
Error: Received unexpected error:
git checkout --force ae2d0ff0a6ac34dc3e8493b2ad45a9badc61a26c
failed exit status 128: fatal: reference is not a tree: ae2d0ff0a6ac34dc3e8493b2ad45a9badc61a26c
Test: TestVerifyCommitSignature
git_test.go:301:
Error Trace: git_test.go:301
Error: "error: 28027897aad1262662096745f2ce2d4c74d02b7f: unable to read file." does not contain "gpg: Signature made"
Test: TestVerifyCommitSignature
git_test.go:308:
Error Trace: git_test.go:308
Error: Should be empty, but was error: 85d660f0b967960becce3d49bd51c678ba2a5d24: unable to read file.
Test: TestVerifyCommitSignature
FAIL
Hello @damadei-google , sorry I had some trouble setting up my dev env with minikube. I was able to run the make test
command and it ran successfully. I am not sure if you ran it in a particular way. I also noticed yesterday I had some issues with the git command itself. Have you tried it today? Do you have a branch I could try out?
Thanks!
Hello @damadei-google , wondering is there any update on this thread? were you able to make it work with gcloud?
@damadei-google is there any update? I'm quite new to argo, but would be interested to contribute somehow.
There is another discussion related to this: https://github.com/argoproj/argo-cd/discussions/6553
I tried to extend the argocd image and add the gcloud SDK. But I got stuck somewhere...
My Dockerfile
FROM argoproj/argocd:latest
# See https://argoproj.github.io/argo-cd/operator-manual/custom_tools/#byoi-build-your-own-image
# for instrucctions how to extend the Argo CD image to provide custom tooling.
# Switch to root for the ability to perform install
USER root
# Install gcloud cli
# See https://cloud.google.com/sdk/docs/downloads-interactive for details
RUN apt-get update && \
apt-get install -y curl && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN curl https://sdk.cloud.google.com > install.sh
RUN bash install.sh --disable-prompts --install-dir=/usr/local/
# Switch back to non-root user
USER argocd
COPY get-gcloud-kube-config /usr/local/google-cloud-sdk/bin/get-gcloud-kube-config
# Add gcloud cli to path
ENV PATH $PATH:/usr/local/google-cloud-sdk/bin
The get-gcloud-kube-config
script is a basic shell script:
#!/bin/bash
while getopts p:c:r: flag
do
case "${flag}" in
p) project=${OPTARG};;
c) cluster=${OPTARG};;
r) region=${OPTARG};;
esac
done
gcloud config set project $project > /dev/null 2>&1
gcloud container clusters get-credentials $cluster --region $region > /dev/null 2>&1
if [ -z ${KUBECONFIG+x} ];
then
cat $HOME/.kube/config
else
cat $KUBECONFIG
fi
After that I tried to add a cluster config to argo:
apiVersion: v1
kind: Secret
metadata:
name: my-gke-cluster
labels:
argocd.argoproj.io/secret-type: cluster
type: Opaque
stringData:
name: my-gke-cluster
server: https://<ip-address>
config: |
{
"execProviderConfig": {
"command": "get-gcloud-kube-config",
"args": [
"-p",
"<project-name>",
"-c",
"<cluster-name>",
"-r",
"europe-west3"
],
"apiVersion": "client.authentication.k8s.io/v1alpha1",
"installHint": "Could not find command 'get-gcloud-kube-config'"
}
}
But I always get the the following error
Failed to cache app resources: Get "https://<ip-address>/version?timeout=32s":
getting credentials: exec plugin is configured to use API version client.authentication.k8s.io/v1alpha1,
plugin returned version client.authentication.k8s.io/__internal
Maybe this is somehow related to https://github.com/argoproj/argo-cd/issues/6749?
@jannfis can you give us some guidance here, please? Or at least do you know somebody who knows somebody...? :)
Does GKE have to work using an exec provider like AWS does? I thought not, because of the gcp auth provider that is compiled in the go-client:
import _ "k8s.io/client-go/plugin/pkg/client/auth/gcp"
But I notice that the codebase does not even import it. So maybe that is part of the problem. The following post seems to outline other things that may be needed to be done to get this to work:
https://www.fullstory.com/blog/connect-to-google-kubernetes-with-gcp-credentials-and-pure-golang/
I have been using https://github.com/sl1pm4t/gcp-exec-creds with:
execProviderConfig:
apiVersion: client.authentication.k8s.io/v1beta1
command: gcp-exec-creds
Would love to see ArgoCD with native support for GCP's Service Accounts.
What is the preferred way to implement it? External exec somehow copied to container or built-in(Go) ?
EDIT: After a very painful debugging session, I realized I never annotated k8s service account with the required workload identity iam.gke.io/gcp-service-account
key. It immediately sprung to life once I added it.
This was a sad day for my productivity.
@piotrjanik Do you have a working example with gcp-exec-creds you could share? I can't for the life of me get the execProvider working 😅. I attempted to follow the steps in https://github.com/argoproj/argo-cd/discussions/6563 but I keep hitting the "the server has asked for the client to provide credentials" message.
My (jsonnet) configuration is as such:
config: {
execProviderConfig: {
apiVersion: "client.authentication.k8s.io/v1beta1",
command: "/kubecluster/goapps/bin/gcp-exec-creds", # pre-baked in custom image
args: ["|", "sed", "-e", "s/ExecCredential/kind/", "-" ]
},
tlsClientConfig: {
insecure: true,
},
},
and my pre-baked Dockerfile:
FROM argoproj/argocd:v2.1.3
# Switch to root for the ability to perform install
USER root
RUN apt-get update && \
apt-get install -y \
curl \
wget \
&& \
apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
RUN mkdir -p /kubecluster/goapps && \
cd /kubecluster && wget https://dl.google.com/go/go1.17.3.linux-amd64.tar.gz && \
tar -C /kubecluster -xzf go1.17.3.linux-amd64.tar.gz && \
export GOPATH=/kubecluster/goapps && \
/kubecluster/go/bin/go install github.com/sl1pm4t/gcp-exec-creds@latest && \
chmod -R 777 /kubecluster && chown 999:999 /kubecluster
# Switch back to non-root user
USER 999
I have confirmed that the specified command + args in the pre-baked image produces:
{
"apiVersion": "client.authentication.k8s.io/v1beta1",
"kind": "ExecCredential",
"status": {
"token": "<redacted_token>"
}
}
Yet, my cluster (in the argo UI) remains in Failed status. Has anyone got the execProvider working on GKE?
Guys, I spent my entire day to make this works. I hope this will help saving time for the others.
FROM argoproj/argocd:v2.1.7 # See https://argoproj.github.io/argo-cd/operator-manual/custom_tools/#byoi-build-your-own-image # for instrucctions how to extend the Argo CD image to provide custom tooling. # Switch to root for the ability to perform install USER root ADD certs/*.crt /usr/local/share/ca-certificates RUN ls -lrt /usr/local/share/ca-certificates RUN update-ca-certificates # Install gcloud cli # See https://cloud.google.com/sdk/docs/downloads-interactive for details RUN apt-get update && \ apt-get install -y curl && \ apt-get install -y wget && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* RUN curl https://sdk.cloud.google.com > install.sh RUN bash install.sh --disable-prompts --install-dir=/usr/local/ RUN mkdir -p /kubecluster/goapps && \ cd /kubecluster && wget https://dl.google.com/go/go1.17.3.linux-amd64.tar.gz && \ tar -C /kubecluster -xzf go1.17.3.linux-amd64.tar.gz && \ export GOPATH=/kubecluster/goapps && \ /kubecluster/go/bin/go install github.com/sl1pm4t/gcp-exec-creds@latest && \ chmod -R 777 /kubecluster && chown 999:999 /kubecluster # Switch back to non-root user USER argocd COPY get-gcloud-kube-config /usr/local/google-cloud-sdk/bin/get-gcloud-kube-config # Add gcloud cli to path ENV PATH $PATH:/usr/local/google-cloud-sdk/bin:/kubecluster/goapps/bin RUN ls -lrt /usr/local/google-cloud-sdk/bin/get-gcloud-kube-config RUN gcloud -v
#!/bin/bash while getopts p:c:r:z: flag do case "${flag}" in p) project=${OPTARG};; c) cluster=${OPTARG};; r) region=${OPTARG};; z) zone=${OPTARG};; esac done gcloud config set project $project > /dev/null 2>&1 gcloud auth activate-service-account --key-file=${GOOGLE_APPLICATION_CREDENTIALS} > /dev/null 2>&1 if [ "${region}" != '' ]; then gcloud container clusters get-credentials $cluster --region $region > /dev/null 2>&1 else # Zone by default gcloud container clusters get-credentials $cluster --zone $zone > /dev/null 2>&1 fi # Return ExecCredential to stdout - https://github.com/sl1pm4t/gcp-exec-creds gcp-exec-creds
apiVersion: v1
kind: Secret
metadata:
name: gke-cluster1
labels:
argocd.argoproj.io/secret-type: cluster
type: Opaque
stringData:
name: gke-cluster1
server: https://<change-this-to-the-cluster-ip>/
config: |
{
"execProviderConfig": {
"command": "get-gcloud-kube-config",
"args": [
"-p",
"<change-this-to-your-gcp-project>",
"-c",
"<cluster-name-here>",
"-z",
"asia-southeast1-a"
],
"apiVersion": "client.authentication.k8s.io/v1beta1",
"installHint": "Could not find command 'get-gcloud-kube-config'"
}
}
Note : You will need to mount the GCP JSON key file to the pod's volume and also need to populate this env variable GOOGLE_APPLICATION_CREDENTIALS with the path of that JSON key file.
Thanks
I have handled the GKE authentication with a slightly different approach. Instead of modifying the argocd image, I have transformed the gcp-exec-creds
binary into a small API server that I deploy as a sidecar to the application controller container.
The deployment of the sidecar is made by patching the application controller kustomization.
Because I run argocd in GKE, I have bound with workload identity the service account associated with the application controller (argocd-application-controller
) to an IAM service account in GKE project.
In the projects hosting clusters that I need to deploy with argocd, I simply set the role/container.admin
IAM policy to the previous service account.
Once this is configured, application controller has a full access to the target clusters. And finally, because the argocd image does not have curl, I use a small python line to request the creds to the side car, such as :
apiVersion: v1
kind: Secret
metadata:
name: gke-cluster1
labels:
argocd.argoproj.io/secret-type: cluster
type: Opaque
stringData:
name: gke-cluster1
server: https://<gke endpoint>/
config: |
{
"execProviderConfig": {
"apiVersion": "client.authentication.k8s.io/v1beta1",
"command": "python3",
"args": [
"-c",
"import urllib.request; print(urllib.request.urlopen('http://localhost:8080/creds').read().decode('utf8'))"
]
},
"tlsClientConfig": {
"caData": "XXXX",
"insecure":false
}
}
I have just pushed the code and the kustomization examples of what I use in https://github.com/romachalm/gcp-exec-creds-sidecar
@romachalm Thank you so much for providing this solution. I've gotten it mostly working, but I've run into a small snag and I don't know exactly what is causing it, and I was hoping you might have some idea. At this point, I can do the following:
When I open Argo's GUI to the cluster screen and attempt to just edit/save the cluster information, it fails with this message in the GUI:
Unable to save changes: Get "https://[cluster IP redacted]/version?timeout=32s": getting credentials: exec: executable python3 failed with exit code 1
A similar message is shown if I just attempt to start deploying things to it. I'm not seeing any logs in the gcp-exec-creds sidecar container that shows it is ever hitting it and/or getting a token returned, and it seems like it is trying to do some verification of the endpoint existing first (which I've proven should work from inside the container's shell). I feel like I've tried everything at this point and I'm unsure of what else to check.
Interestingly enough, if I leave everything exactly as it is and just use the bearerToken from my local kubeconfig, it works perfectly. It also works if I manually set the bearerToken to the one that I retrieve from the container using your python command. So it seems like something specifically going wrong with the execProviderConfig setup that is causing it to never try to retrieve the token?
What is really strange to me is that changing the python line that it runs to simply print the result of what your service provides me manually actually works no problem... I don't understand how the python command can only fail when the Argo Controller is running it and when it is attempting to open the localhost:8080/creds url?
args: [
"-c",
"print('{\"apiVersion\":\"client.authentication.k8s.io/v1beta1\",\"kind\":\"ExecCredential\",\"status\":{\"token\":\"ya29.c.KtwCHw... etc\"}}')"
]
I ended up adding github.com/sl1pm4t/gcp-exec-creds
to my argocd image.
apiVersion: v1
kind: Secret
metadata:
name: my-gcp-cluster
annotations:
managed-by: argocd.argoproj.io
labels:
argocd.argoproj.io/secret-type: cluster
stringData:
config: |
{
"tlsClientConfig": {
"insecure": false,
"caData": "[snip]"
},
"execProviderConfig": {
"command": "gcp-exec-creds",
"apiVersion": "client.authentication.k8s.io/v1beta1"
}
}
name: [redacted]
server: https://[redacted]
This seems to work well.
The argocd k8s service account is connected via workload identity to a GCloud service account on that has access to the other cluster via cluster-role-binding like this (note the extra User subject):
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: argocd-manager-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: argocd-manager-role
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: argocd@[redacted].iam.gserviceaccount.com
- kind: ServiceAccount
name: argocd-manager
namespace: kube-system
I finally figured out what I was doing wrong. I'm using the helm charts provided here: https://github.com/argoproj/argo-helm/tree/master/charts/argo-cd
It splits the deployment into four parts: application-controller, repo-server, redis, and server
In my case, I needed to add the sidecar container to both application-controller (needed at the time of a deployment) and server (for the GUI to work?), which are deployed by default using two different service accounts (so both service accounts needed to be tied to the workload identity), and I had to set it to use a port that didn't conflict with either of them.
For now, this will get me by. I was trying to avoid having to modify the argocd image which is why I liked the sidecar approach, but I might end up switching eventually if I get around to automating the ArgoCD custom image build. Thanks!
I've also succeeded using GKE and Workload Identity. I've been using the helm chart, but this should also work for kustomize. For the helm config i have:
controller:
serviceAccount:
annotations:
iam.gke.io/gcp-service-account: <gcp-sa-name>@<projectid>.iam.gserviceaccount.com
volumes:
- name: custom-tools
emptyDir: {}
initContainers:
- name: gcp-exec-credentials-installer
image: golang:1.17-buster
env:
- name: GOPATH
value: /custom-tools/go
command: [sh, -c]
args:
- go install github.com/sl1pm4t/gcp-exec-creds@44ac497
volumeMounts:
- mountPath: /custom-tools
name: custom-tools
volumeMounts:
- mountPath: /custom-tools
name: custom-tools
And then the secret is:
apiVersion: v1
stringData:
name: <cluster name>
server: <cluster ip>
config: |
{
"execProviderConfig": {
"command": "/custom-tools/go/bin/gcp-exec-creds",
"apiVersion": "client.authentication.k8s.io/v1beta1",
"installHint": "Could not find command 'gcp-exec-creds'"
},
"tlsClientConfig": {
"insecure": false,
"caData": "redacted"
}
}
kind: Secret
metadata:
labels:
argocd.argoproj.io/secret-type: cluster
name: <secret-name>
namespace: argocd
type: Opaque
The gcp sa used by argo needs to have access to the remove cluster using gcp iam roles.
@lusid that's strange. I only have the sidecar running on application-controller
and not on server
argocd-application-controller-0 2/2 Running 0 11h
argocd-server-585ddd6bb7-h4jgf 1/1 Running 0 11h
And I have no error shown in the GUI, the status is sucessful. I have tested on version v2.2.0+6da92a8. What's yours ?
@lusid I detected the issue you mentioned. TIL argocd-server
also uses the cluster secret to fetch the logs from from the pods. So indeed the server needs also to run with the sidecar.
To do so, I had to modify the port of my sidecars to listen on port 7070 (simply adding the args: ["-port", "7070"]
to the sidecars), because it conflicts with 8080 used by argocd-server
. My exec cmd became :
"import urllib.request; print(urllib.request.urlopen('http://localhost:7070/creds').read().decode('utf8'))"
I also needed to bind the sa argocd-server
to workload identity.
I have access again to the pods logs in the UI.
I have applied the custom image and the gke binary and have it working... somewhat with multi-cluster using a connect gateway to authenticate argo to the cluster's api; however I am having 2 problems:
1) where if the connect gateway agent crashes all resources with the associated remote cluster are gone and the application controller does not attempt to re-watch the resources. I have to manually run invalidate cache for the resources to pop up again. I understand why they disappear when the agent crashes, but the agent does come back up and I would expect the application controller to retry the watch on all the resources.
2) Another interesting thing I am noticing is that I am trying to deploy an app of app pattern to the remote cluster and it deploy the application to the cluster, but the application controller does nothing with it. It see's it there, but it will not deploy the resources associated with the application to the cluster.
Based on the nice work in #9190, here is an example with detailed instructions (based on Helm): https://github.com/zhang-xuebin/argo-helm/tree/host-on-gke is an example.
You can use ArgoCD to manage Private GKE clusters. VPC-peering, master Authorized Networking, etc, are NOT needed.
@zhang-xuebin Hello, I followed your guide and was able to connect to private GKE cluster using connect gateway.
However, in the last step when creating secret for external cluster. I need to change the server URL from server: https://connectgateway.googleapis.com/v1beta1/...
to server: https://connectgateway.googleapis.com/v1/...
in order to make it work.
@clive2000 thanks for sharing the feedback. That's interesting, would you mind sharing what's the problem when you use https://connectgateway.googleapis.com/v1beta1/...
? v1beta1 should behave very similar with v1
.
Hello @zhang-xuebin , I checked the guide and repo, and it's great. I have a question, are the helm charts provided in the repo any different from the default helm charts on the official helm repos, or were they modified in any way to allow for the private connection to work? Thanks in advance!
Hello @laloyalo , you can see the latest commit, https://github.com/argoproj/argo-helm/commit/458221674e44d080f6fabb3b1cfdadee4f28fe6f
Compared with official helm charts, the only difference is in ./charts/argo-cd/values.yaml, which is to enable Workload Identity by adding an annotations
.
A note of caution: the server format has changed:
For instance:
https://connectgateway.googleapis.com/v1/projects/814233161045/locations/global/gkeMemberships/my-company-dev
Summary
Add support for GCP authentication for GKE instead of keeping a token in a secret, pretty much similar to the AWS EKS support.
Motivation
Some GKE scenarios with multi cluster management can rely on Anthos Connect Gateway like seen here: https://cloud.google.com/anthos/multicluster-management/gateway/using. In such scenarios, the only authentication option is via GCP authentication, sending an OAuth token instead of a Kubernetes token.
This will open endless possibilities like having Argo CD in GCP orchestrating the deployment of many environments in multiple places (on-prem, GCP, other clouds, etc) via Anthos Connect Gateway.
Proposal
Leverage GCP authentication similar to what is done in aws eks get-token nowadays. I believe the K8S client-go already supports the auth plugin for GCP.