akuity / kargo

Application lifecycle orchestration
https://kargo.akuity.io/
Apache License 2.0
1.56k stars 137 forks source link

Multistage deployments with ArgoCD-Trigger / -Health on decentralised and air-gapped cluster installations #1384

Closed MarkusNeuron closed 1 month ago

MarkusNeuron commented 9 months ago

Checklist

Dear community, for us Kargo is a perfect match because we a implemented the concepts of staging & promotion via ArgoEvents / Workflow by our self but this ended up to be very intransparent and also not flexible enough for our DEVs.

As a highly regulated enterprise we running ephemeral and air-gapped cluster where each cluster has its own ArgoCD instance that just pulls the relevant manifests and stage configurations from GitHub Enterprise. So stages are distributed between clusters that are not aware of each other. The whole orchestration is done via GitOps and the mentioned pipelines.

Proposed Feature

With Kargo we would gain the full transparency if we model promotion end2end but here is the issue: A central Kargo instance can not access the ArgoCD health of other clusters and can not trigger sync.

Motivation

By design and compliance reasons we run all our clusters decentralised fully via GitOps. To still have the (at the moment missing) full transparency of the CD promotion including the great ArgoCD integration features we would need to access the remote ArgoCD app status.

Suggested Implementation

For this to work we would need a controller on each cluster that is able to communicate with the central Kargo installation. So there would be the need to be able to configure a new stage.promotionMechanism like e.g. argoCDRemoteAppUpdate where we would configure the app config and additionally the remote controller endpoint.

krancour commented 9 months ago

The model you've described is almost exactly what we already follow -- controllers may be distributed and communicate with a "nearby" Argo CD control plane, but a centralized Kargo control plane. It's all "phone home," and never the other way around.

There is, however, no need to configure stages to know where the relevant Argo CD is. Stages can be labeled as belonging to a "shard" and they will be reconciled only by the corresponding Kargo Controller, which already knows how to talk to its "nearby" Argo CD control plane.

In short, you'll have multiple Kargo controllers, each of which is in community with the Kargo control plane and a Argo CD control plane.

MarkusNeuron commented 9 months ago

@krancour I am so flashed to read that. After reading you comment I had a looked at the helm charts again and found api.argocd.urls and controller.shardName parameter. Is this what we need to configure? Which label do we need to put on the stage to make this working? If this really is implemented already I would like to contribute to an an "advanced deployment tutorial". Guys you should show what this product is capable of! WOW! 💪

krancour commented 9 months ago

@MarkusNeuron thank you for the kind words. I missed the caveat... that's how we built, but no one has tested extensively with this topology yet.

Which label do we need to put on the stage to make this working?

kargo.akuity.io/shard: <shard name>

Also note:

MarkusNeuron commented 9 months ago

Will test this setup in the next days and report back. Thx again!

WZHGAHO commented 7 months ago

Hi @krancour

I tested the sharded topology, and wrote the following guide. Please review it and let me know if something needs to be corrected, because my understanding/assumption was not correct. If you think I should move this guide to GitHub Discussion, let me know, it might be useful for other users who might want to experiment with the feature.

I aligned with @MarkusNeuron and we have the following follow-up questions:

  1. Is it the only way to make sharded topology model work by configuring k8s clients (kubeconfigs) and therefore exposing the kube-api-server on the cluster where the central ("management") Kargo is running?
  2. Would it be an option to just expose the kargo-controller endpoint of the central ("management") controller instead (and this controller would query/create/update the local kube-api-server and communicates to the distributed controller the Kargo-related custom resources) of exposing the whole kube-api-server of the central cluster? IMHO this would fit more to an air-gapped cluster concept.

Kargo Sharded Topology Guide

Setup environments

Prerequisites:

Create two new kind clusters:

  1. central-mgmt (will be Kargo control plane)
  2. distributed (will "phone-home" to central-mgmt)
kind create cluster \
  --wait 120s \
  --config - <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: central-mgmt
nodes:
- extraPortMappings:
  - containerPort: 31443 # Argo CD dashboard
    hostPort: 31443
  - containerPort: 31444 # Kargo dashboard
    hostPort: 31444
  - containerPort: 30081 # test application instance
    hostPort: 30081
  - containerPort: 30082 # UAT application instance
    hostPort: 30082
  - containerPort: 30083 # prod application instance
    hostPort: 30083

EOF
kind create cluster \
  --wait 120s \
  --config - <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: distributed
nodes:
- extraPortMappings:
  - containerPort: 31445 # Argo CD dashboard
    hostPort: 31445
  - containerPort: 31446 # Kargo dashboard
    hostPort: 31446
  - containerPort: 30181 # test application instance
    hostPort: 30181
  - containerPort: 30182 # UAT application instance
    hostPort: 30182
  - containerPort: 30183 # prod application instance
    hostPort: 30183

EOF

Once clusters are ready, you can change context in between them using kubectx

Deploy Helm charts

cert-manager

Change context to central-mgmt cluster:

kubectx kind-central-mgmt
helm install cert-manager cert-manager --repo https://charts.jetstack.io --version 1.11.5 --namespace cert-manager --create-namespace --set installCRDs=true --set image.repository=docker.example.com/jetstack/cert-manager-controller --set cainjector.image.repository=docker.example.com/jetstack/cert-manager-cainjector --set webhook.image.repository=docker.example.com/jetstack/cert-manager-webhook --set startupapicheck.image.repository=docker.example.com/jetstack/cert-manager-ctl --wait

Change context to distributed cluster:

kubectx kind-distributed

Repeat the previous Helm command to install cert-manager on distributed cluster as well.

ArgoCD

Install the chart first on central-mgmt cluster, use NodePort=31443

Change context to central-mgmt cluster:

kubectx kind-central-mgmt
helm upgrade --install argocd argo-cd --repo https://argoproj.github.io/argo-helm --version 5.51.6 --namespace argocd --create-namespace --set 'configs.secret.argocdServerAdminPassword=$2a$10$5vm8wXaSdbuff0m9l21JdevzXBzJFPCi8sy6OOnpZMAG.fOXL7jvO' --set dex.enabled=false --set notifications.enabled=false --set server.service.type=NodePort --set server.service.nodePortHttp=31443 --set server.extensions.enabled=true --set 'server.extensions.contents[0].name=argo-rollouts' --set 'server.extensions.contents[0].url=https://github.com/argoproj-labs/rollout-extension/releases/download/v0.3.3/extension.tar' --set global.image.repository=docker.example.com/argoproj/argocd --set redis.image.repository=docker.example.com/docker/library/redis --set server.extensions.image.repository=docker.example.com/argoproj-labs/argocd-extensions --wait

Secondly, install the chart on distributed cluster, use NodePort=31445

Change context to distributed cluster:

kubectx kind-distributed
helm upgrade --install argocd argo-cd --repo https://argoproj.github.io/argo-helm --version 5.51.6 --namespace argocd --create-namespace --set 'configs.secret.argocdServerAdminPassword=$2a$10$5vm8wXaSdbuff0m9l21JdevzXBzJFPCi8sy6OOnpZMAG.fOXL7jvO' --set dex.enabled=false --set notifications.enabled=false --set server.service.type=NodePort --set server.service.nodePortHttp=31445 --set server.extensions.enabled=true --set 'server.extensions.contents[0].name=argo-rollouts' --set 'server.extensions.contents[0].url=https://github.com/argoproj-labs/rollout-extension/releases/download/v0.3.3/extension.tar' --set global.image.repository=docker.example.com/argoproj/argocd --set redis.image.repository=docker.example.com/docker/library/redis --set server.extensions.image.repository=docker.example.com/argoproj-labs/argocd-extensions --wait

Argo Rollouts

Install on both clusters Argo Rollouts.

Change context to central-mgmt cluster:

kubectx kind-central-mgmt
helm upgrade --install argo-rollouts argo-rollouts --repo https://argoproj.github.io/argo-helm --version 2.33.0 --create-namespace --namespace argo-rollouts --set controller.image.registry=docker.example.com --set controller.image.repository=argoproj/argo-rollouts --wait

Change context to distributed cluster:

kubectx kind-distributed

Repeat the previous Helm command to install Argo Rollouts on distributed cluster as well.

Kargo

central-mgmt

Change context to central-mgmt cluster:

kubectx kind-central-mgmt

Set api.service.nodePort=31444\ Set controller.shardName=central-mgmt

helm upgrade --install kargo oci://ghcr.io/akuity/kargo-charts/kargo --namespace kargo --create-namespace --set api.service.type=NodePort --set api.service.nodePort=31444 --set api.adminAccount.password=admin --set api.adminAccount.tokenSigningKey=iwishtowashmyirishwristwatch --set image.repository=docker.example.com/akuity/kargo --set controller.shardName=central-mgmt --wait

distributed

Change context to distributed cluster:

kubectx kind-distributed

Set api.service.nodePort=31446\ Set controller.shardName=distributed\ Set api.argocd.urls mapping to point to https://argocd-server.argocd.svc - this is the ArgoCD running next to Kargo on distributed cluster

Prepare the kubeconfig which Kargo will use to connect to central-mgmt cluster:

  1. Copy \~/.kube/config to \~/kubeconfig.yaml

    cp ~/.kube/config ~/kubeconfig.yaml
  2. Edit \~/kubeconfig.yaml and keep only the central-mgmt cluster relevant entries. Make sure current-context is set to kind-central-mgmt. It should contain similar like this:

    apiVersion: v1
    clusters:
    - cluster:
        certificate-authority-data: ...
        server: https://127.0.0.1:53113
      name: kind-central-mgmt
    contexts:
    - context:
        cluster: kind-central-mgmt
        user: kind-central-mgmt
      name: kind-central-mgmt
    current-context: kind-central-mgmt
    kind: Config
    preferences: {}
    users:
    - name: kind-central-mgmt
      user:
        client-certificate-data: ...
        client-key-data: ...
    
  3. Get the IP address of the container that runs central-mgmt cluster with the following command:

    docker inspect -f '{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' central-mgmt-control-plane
  4. Set in \~/kubeconfig.yaml the IP address under server key. This key is nested under cluster which is in turn nested under the - cluster: item in the clusters list. Very likely currently it has the value https://127.0.0.1:<someport>. You have to change it https://<ip_address_from_step_3>:6443

    For example: https://172.18.0.2:6443

  5. Create kargo namespace

    kubectl create namespace kargo
  6. Create a secret with the following command

    kubectl create secret generic central-mgmt-kubeconfig --from-file=kubeconfig.yaml -n kargo

Once the secret is created, prepare values.yaml for the Helm chart installation with a file editor, e.g. with vim values.yaml

values.yaml should contain:

api:
  service:
    type: NodePort
    nodePort: 31446
  adminAccount:
    password: admin
    tokenSigningKey: iwishtowashmyirishwristwatch
  argocd:
    urls:
      "distributed": https://argocd-server.argocd.svc
image:
  repository: docker.example.com/akuity/kargo
kubeconfigSecrets:
  kargo: central-mgmt-kubeconfig
controller:
  shardName: distributed

Finally, deploy the Helm chart:

helm upgrade --install kargo oci://ghcr.io/akuity/kargo-charts/kargo --namespace kargo --create-namespace -f values.yaml --wait

Create ArgoCD applications

central-mgmt

Change context to central-mgmt cluster:

kubectx kind-central-mgmt
cat <<EOF | kubectl apply -f -
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: kargo-demo
  namespace: argocd
spec:
  generators:
  - list:
      elements:
      - stage: test
  template:
    metadata:
      name: kargo-demo-{{stage}}
      annotations:
        kargo.akuity.io/authorized-stage: kargo-demo:{{stage}}
    spec:
      project: default
      source:
        repoURL: ${GITOPS_REPO_URL}
        targetRevision: stage/{{stage}}
        path: stages/{{stage}}
      destination:
        server: https://kubernetes.default.svc
        namespace: kargo-demo-{{stage}}
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
EOF

distributed

Change context to distributed cluster:

kubectx kind-distributed
cat <<EOF | kubectl apply -f -
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: kargo-demo
  namespace: argocd
spec:
  generators:
  - list:
      elements:
      - stage: uat
      - stage: prod
  template:
    metadata:
      name: kargo-demo-{{stage}}
      annotations:
        kargo.akuity.io/authorized-stage: kargo-demo:{{stage}}
    spec:
      project: default
      source:
        repoURL: ${GITOPS_REPO_URL}
        targetRevision: stage/{{stage}}
        path: stages/{{stage}}
      destination:
        server: https://kubernetes.default.svc
        namespace: kargo-demo-{{stage}}
      syncPolicy:
        syncOptions:
        - CreateNamespace=true
EOF

Deploy modified Kargo Quickstart resources

We are going to re-use the Kargo resources from Kargo Quickstart guide, the only adaptations we have to do are the following:

We model that our test stage is on central-mgmt cluster, and uat and prod stages are on distributed cluster.

  1. Do the steps as per Kargo Quickstart guide (section Create a GitOps Repository section only), and fork the kago-demo repo. Also make sure that your GITOPS_REPO_URL variable is set.
  2. In the forked repo of your own, change base/deploy.yaml line number 17 the image and change in the image value from nginx:placeholder to docker.example.com/nginx/nginx:placeholder
  3. Save your GitHub handle and your personal access token in environment variables:

    export GITHUB_USERNAME=<your github handle>
    export GITHUB_PAT=<your personal access token>

Change context to central-mgmt cluster:

kubectx kind-central-mgmt

Run the following command:

cat <<EOF | kubectl apply -f -
apiVersion: kargo.akuity.io/v1alpha1
kind: Project
metadata:
  name: kargo-demo
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: kargo-demo-repo
  namespace: kargo-demo
  labels:
    kargo.akuity.io/secret-type: repository
stringData:
  type: git
  url: ${GITOPS_REPO_URL}
  username: ${GITHUB_USERNAME}
  password: ${GITHUB_PAT}
---
apiVersion: kargo.akuity.io/v1alpha1
kind: Warehouse
metadata:
  name: kargo-demo
  namespace: kargo-demo
  labels:
    kargo.akuity.io/shard: central-mgmt
spec:
  subscriptions:
  - image:
      repoURL: docker.example.com/nginx/nginx
      semverConstraint: ^1.25.0
---
apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
  name: test
  namespace: kargo-demo
  labels:
    kargo.akuity.io/shard: central-mgmt
spec:
  subscriptions:
    warehouse: kargo-demo
  promotionMechanisms:
    gitRepoUpdates:
    - repoURL: ${GITOPS_REPO_URL}
      writeBranch: stage/test
      kustomize:
        images:
        - image: docker.example.com/nginx/nginx
          path: stages/test
    argoCDAppUpdates:
    - appName: kargo-demo-test
      appNamespace: argocd
---
apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
  name: uat
  namespace: kargo-demo
  labels:
    kargo.akuity.io/shard: distributed
spec:
  subscriptions:
    upstreamStages:
    - name: test
  promotionMechanisms:
    gitRepoUpdates:
    - repoURL: ${GITOPS_REPO_URL}
      writeBranch: stage/uat
      kustomize:
        images:
        - image: docker.example.com/nginx/nginx
          path: stages/uat
    argoCDAppUpdates:
    - appName: kargo-demo-uat
      appNamespace: argocd
---
apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
  name: prod
  namespace: kargo-demo
  labels:
    kargo.akuity.io/shard: distributed
spec:
  subscriptions:
    upstreamStages:
    - name: uat
  promotionMechanisms:
    gitRepoUpdates:
    - repoURL: ${GITOPS_REPO_URL}
      writeBranch: stage/prod
      kustomize:
        images:
        - image: docker.example.com/nginx/nginx
          path: stages/prod
    argoCDAppUpdates:
    - appName: kargo-demo-prod
      appNamespace: argocd
EOF

Verify the hypothesis:

AnalysisTemplates

Change context to central-mgmt cluster:

kubectx kind-central-mgmt

Create AnalysisTemplate

cat <<EOF | kubectl apply -f -
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: kargo-demo-analysistemplate-uat
  namespace: kargo-demo
spec:
  metrics:
  - name: fail-or-pass
    #count: 1
    #interval: 5s
    #failureLimit: 1
    provider:
      job:
        spec:
          template:
            spec:
              containers:
              - name: sleep
                image: docker.example.com/alpine:latest
                command: [sh, -c]
                args:
                - exit {{args.exit-code}}
              restartPolicy: Never
          backoffLimit: 1
EOF

Modify the uat stage, and add verification to the spec:

cat <<EOF | kubectl apply -f -
apiVersion: kargo.akuity.io/v1alpha1
kind: Stage
metadata:
  name: uat
  namespace: kargo-demo
  labels:
    kargo.akuity.io/shard: distributed
spec:
  subscriptions:
    upstreamStages:
    - name: test
  promotionMechanisms:
    gitRepoUpdates:
    - repoURL: ${GITOPS_REPO_URL}
      writeBranch: stage/uat
      kustomize:
        images:
        - image: docker.example.com/nginx/nginx
          path: stages/uat
    argoCDAppUpdates:
    - appName: kargo-demo-uat
      appNamespace: argocd
  verification:
    analysisTemplates:
    - name: kargo-demo-analysistemplate-uat
    analysisRunMetadata:
      labels:
        app: kargo-demo-analysistemplate-uat
      annotations:
        foo: bar
    args:
    - name: exit-code # no CamelCaseAllowed!
      value: "0"
EOF

Modify Warehouse, and add new image subscription. In my example this is docker2.example.com/some/new/dummy/repo/image with semverConstraint ^2024.0.0

cat <<EOF | kubectl apply -f -
apiVersion: kargo.akuity.io/v1alpha1
kind: Warehouse
metadata:
  name: kargo-demo
  namespace: kargo-demo
  labels:
    kargo.akuity.io/shard: central-mgmt
spec:
  subscriptions:
  - image:
      repoURL: docker.example.com/nginx/nginx
      semverConstraint: ^1.25.0
  - image:
      repoURL: docker2.example.com/some/new/dummy/repo/image
      semverConstraint: ^2024.0.0
EOF

Change context to distributed cluster:

kubectx kind-distributed

Create namespace kargo-demo - this is needed, because there is no kargo-demo namespace yet on distributed cluster, and AnalysisRun will be triggered in that namespace:

kubectl create namespace kargo-demo

Make sure that the new Freight appeared, then promote that new Freight first to test, and after that to uat stage. In the uat stage, AnalysisRun should be triggered.

Change back to central-mgmt cluster and check the stage field:

kubectx kind-central-mgmt
kubectl get stage uat -n kargo-demo -o yaml

It should show similar:

    verificationInfo:
      analysisRun:
        namespace: kargo-demo
        name: uat.01hrfdz695qqqecvrzh4csp7bm.2511465
        phase: Successful
      phase: Successful

AnalysisRun resource was created and was running on distributed cluster. (On the same cluster where the stage shard label defines.)


github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it had no activity for 90 days. It will be closed if no activity occurs in the next 30 days but can be reopened if it becomes relevant again.

krancour commented 1 month ago

Closing this issue, but @WZHGAHO we will likely use elements of your guide in addressing #2447