allenporter / flux-local

flux-local is a set of tools and libraries for managing a local flux gitops repository focused on validation steps to help improve quality of commits, PRs, and general local testing.
https://allenporter.github.io/flux-local/
Apache License 2.0
151 stars 22 forks source link

Failing to find HelmRepository #492

Closed chrisbsmith closed 9 months ago

chrisbsmith commented 10 months ago

Potentially related to #225 && #483

My latest my PR flux diff checks are failing to find the potentially related HelmRepository.

Run docker://ghcr.io/allenporter/flux-local:main
/usr/bin/docker run --name ghcrioallenporterfluxlocalmain_ed81d7 --label 6ed130 --workdir /github/workspace --rm -e "INPUT_ARGS" -e "HOME" -e "GITHUB_JOB" -e "GITHUB_REF" -e "GITHUB_SHA" -e "GITHUB_REPOSITORY" -e "GITHUB_REPOSITORY_OWNER" -e "GITHUB_REPOSITORY_OWNER_ID" -e "GITHUB_RUN_ID" -e "GITHUB_RUN_NUMBER" -e "GITHUB_RETENTION_DAYS" -e "GITHUB_RUN_ATTEMPT" -e "GITHUB_REPOSITORY_ID" -e "GITHUB_ACTOR_ID" -e "GITHUB_ACTOR" -e "GITHUB_TRIGGERING_ACTOR" -e "GITHUB_WORKFLOW" -e "GITHUB_HEAD_REF" -e "GITHUB_BASE_REF" -e "GITHUB_EVENT_NAME" -e "GITHUB_SERVER_URL" -e "GITHUB_API_URL" -e "GITHUB_GRAPHQL_URL" -e "GITHUB_REF_NAME" -e "GITHUB_REF_PROTECTED" -e "GITHUB_REF_TYPE" -e "GITHUB_WORKFLOW_REF" -e "GITHUB_WORKFLOW_SHA" -e "GITHUB_WORKSPACE" -e "GITHUB_ACTION" -e "GITHUB_EVENT_PATH" -e "GITHUB_ACTION_REPOSITORY" -e "GITHUB_ACTION_REF" -e "GITHUB_PATH" -e "GITHUB_ENV" -e "GITHUB_STEP_SUMMARY" -e "GITHUB_STATE" -e "GITHUB_OUTPUT" -e "RUNNER_OS" -e "RUNNER_ARCH" -e "RUNNER_NAME" -e "RUNNER_ENVIRONMENT" -e "RUNNER_TOOL_CACHE" -e "RUNNER_TEMP" -e "RUNNER_WORKSPACE" -e "ACTIONS_RUNTIME_URL" -e "ACTIONS_RUNTIME_TOKEN" -e "ACTIONS_CACHE_URL" -e "ACTIONS_RESULTS_URL" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/file_commands" -v "/home/runner/work/homelab/homelab":"/github/workspace" ghcr.io/allenporter/flux-local:main diff helmrelease --unified 6 --path /github/workspace/pull/kubernetes/apps --path-orig /github/workspace/default/kubernetes/apps --strip-attrs "helm.sh/chart,checksum/config,app.kubernetes.io/version,chart" --limit-bytes 10000 --all-namespaces --sources "home-kubernetes" --output-file diff.patch
Unable to find Secret monitoring/snmp-secret referenced in HelmRelease monitoring/snmp-exporter
Unable to find Secret monitoring/snmp-secret referenced in HelmRelease monitoring/snmp-exporter
flux-local error:  Unable to find HelmRepository for flux-system-prometheus-community/kube-prometheus-stack for HelmRelease kube-prometheus-stack

I've run your local debugging testing and see results that I believe look correct

/ # flux-local get cluster --path /homelab/kubernetes/
PATH          KUSTOMIZATIONS
kubernetes    38
/ # flux-local get helmreleases --path /homelab/kubernetes --all-namespaces
Unable to find Secret monitoring/snmp-secret referenced in HelmRelease monitoring/snmp-exporter
NAMESPACE       NAME                       REVISION    CHART                                                SOURCE
services        atuin                      2.4.0       services-app-template                                bjw-s
auth            authelia                   2.4.0       auth-app-template                                    bjw-s
cert-manager    cert-manager               v1.13.3     cert-manager-cert-manager                            jetstack
kube-system     cilium                     1.14.5      kube-system-cilium                                   cilium
networking      cloudflared                2.4.0       networking-app-template                              bjw-s
database        cloudnative-pg             0.20.0      database-cloudnative-pg                              cloudnative-pg
kube-system     coredns                    1.29.0      kube-system-coredns                                  coredns
networking      echo-server                2.4.0       networking-app-template                              bjw-s
networking      external-dns-cloudflare    1.13.1      networking-external-dns                              external-dns
networking      external-dns-pihole        1.13.1      networking-external-dns                              external-dns
kube-system     external-secrets           0.9.11      kube-system-external-secrets                         external-secrets
kube-system     onepassword-connect        2.4.0       kube-system-app-template                             bjw-s
monitoring      gatus                      2.4.0       monitoring-app-template                              bjw-s
monitoring      grafana                    7.0.19      monitoring-grafana                                   grafana
monitoring      kube-prometheus-stack      55.5.1      monitoring-kube-prometheus-stack                     prometheus-community
monitoring      kubernetes-dashboard       6.0.8       monitoring-kubernetes-dashboard                      kubernetes-dashboard
auth            lldap                      2.4.0       auth-app-template                                    bjw-s
kube-system     local-path-provisioner     None        kube-system-./deploy/chart/local-path-provisioner    local-path-provisioner
kube-system     metrics-server             3.11.0      kube-system-metrics-server                           metrics-server
storage         minio                      2.4.0       storage-app-template                                 bjw-s
networking      nginx-external             4.9.0       networking-ingress-nginx                             ingress-nginx
networking      nginx-internal             4.9.0       networking-ingress-nginx                             ingress-nginx
database        redis                      18.6.2      database-redis                                       bitnami
kube-system     reloader                   1.0.60      kube-system-reloader                                 stakater
default         smtp-relay                 2.4.0       default-app-template                                 bjw-s
kube-system     snapshot-controller        2.0.4       kube-system-snapshot-controller                      piraeus
monitoring      snmp-exporter              2.4.0       monitoring-app-template                              bjw-s
kube-system     synology-csi               0.9.7       kube-system-synology-csi                             synology-csi
monitoring      unpoller                   2.4.0       monitoring-app-template                              bjw-s
volsync         volsync                    0.8.0       volsync-volsync                                      backube
flux-system     weave-gitops               4.0.36      flux-system-weave-gitops                             weave-gitops

The flux-local get helmreleases even shows the kube-prometheus-stack helmrelease that the PR check is reporting missing.

FWIW, I pulled the latest main docker image and ran this test there.

What is maybe interesting is that I get a missing HelmRepository that seems to relate to the changing package in the PR. In another one of my PRs it was a missing redis HelmRepository

Any ideas?

Will add a new comment with output from a run with debug: true on the action.

chrisbsmith commented 10 months ago

I added debug: true to the action but it doesn't seem to have triggered the debug logs.

https://github.com/chrisbsmith/homelab/actions/runs/7427407785/job/20213017048#step:5:1

allenporter commented 10 months ago

Ah sorry, I see you aren't using the diff action but a custom diff CLI. You'll need to add --log-level=DEBUG to get debug logs like the action does https://github.com/allenporter/flux-local/blob/9b3225d9c902205e403173bc3069ef51ee421d56/action/diff/action.yml#L94

allenporter commented 10 months ago

How about flux-local get cluster -o yaml? That will show the detected HelmRepository for each kustomization

chrisbsmith commented 10 months ago

Ah sorry, I see you aren't using the diff action but a custom diff CLI. You'll need to add --log-level=DEBUG to get debug logs like the action does

https://github.com/allenporter/flux-local/blob/9b3225d9c902205e403173bc3069ef51ee421d56/action/diff/action.yml#L94

I should've caught this too. Added! Lots more data that I haven't quite figured out how to read yet.

How about flux-local get cluster -o yaml? That will show the detected HelmRepository for each kustomization

I created a new step with this as well. Results are here. The helm_repo map is empty but the helm_release is populatedL

  - name: kube-prometheus-stack
    namespace: flux-system
    path: kubernetes/apps/monitoring/kube-prometheus-stack/app
    helm_repos: []
    helm_releases:
    - name: kube-prometheus-stack
      namespace: monitoring
      chart:
        name: kube-prometheus-stack
        repo_name: prometheus-community
        repo_namespace: flux-system
    cluster_policies: []
    config_maps:
    - name: kube-prometheus-stack-values
      namespace: monitoring
    - name: alertmanager-config-tpl
      namespace: monitoring
    - name: kube-state-metrics-configmap
      namespace: monitoring
allenporter commented 10 months ago

That looks like partial output of get cluster -- i think in this case the HelmRepository should be coming from another kustomization? Which kustomization includes kubernetes/flux/repositories/? Maybe you can include just that one. I am curious if it finds all of the needed HelmRepository objects or is just missing this one, etc.

chrisbsmith commented 10 months ago

So I'm also using onedr0p's repo like the back and forth around this comment and have a kubernetes/flux/config dir that has multiple Kustomizations in it?

I did so some additional local testing. I modified a helmrelease.yaml locally with some garbage image tags then ran

flux-local --log-level DEBUG diff helmrelease --path /homelab2/kubernetes/apps --path-orig /homelab/kubernetes/apps --all-namespaces --sources "home-kubernetes" --strip-attrs
 "helm.sh/chart,checksum/config,app.kubernetes.io/version,chart" --limit-bytes 10000 --unified 6

and it throws an error with the HelmRepository for the app I modified.

I'm not sure why this works in onedr0p's repo and not mine, but I'll try to spend some time this weekend investigating.

Thanks for taking the time to look into this.

allenporter commented 10 months ago

I think the problem may be that --path is pointing to a subdirectory apps that does not include the HelmRepository we're talking about.

Right now the way --path works is that it is meant to point at the "root" of the cluster. It doesn't know how to trace back dependencies to find that kustomization with the HelmRepository unless it can find it form the root. I know ondr0p set this up to try to do more efficient diffs for specific paths, but that won't work if it can't find all the dependencies.

chrisbsmith commented 10 months ago

I haven't had a much time to look at this as I'd like this weekend, but I did notice a successful run from a new PR. The configurations are the same and I didn't make any updates to my repo. 🤷

I saw your k8s-gitops repo and I'll take a look and see how things are configured there and test it that cleans things up on my end.

ahgraber commented 9 months ago

I believe I'm also experiencing this issue; I'm also using onedr0p's config.

Failing run here with debug logs enabled.
Kustomization diffs work just fine, but HelmReleases fail.

I've just updated to specify my --path at the cluster root; will report back

allenporter commented 9 months ago

In the previous example isn't the problem that the path isn't including the sources?

allenporter commented 9 months ago

E.g. you have it pointed at the apps subdirectory so it can't find the helm repositories defined outside.

Point path to the root of the cluster where you point the bootstrap. Calling it with some other separate subdirectory means helm template can't work.

allenporter commented 9 months ago

The reason it works in onedr0p's repo is https://github.com/onedr0p/home-ops/blob/12ffe9a60896a5a9b6985756840c5cd8a4587640/.github/workflows/flux-diff.yaml#L40C11-L40C30 is restricting the changed paths to a subdirectory of the multi-cluster setup so --path /github/workspace/pull/kubernetes/main which is the full cluster. If you copy the same config you're setting a path to a subdirectory within the cluster.

chrisbsmith commented 9 months ago

🤦 that was it. Thanks for continuing to dig into this when it was clearly a user (copy paste without truly learning the code) problem.

@ahgraber The issue is that onedr0p is managing two clusters out of a single repo and so the changed-files job has to look to determine which cluster the change is in (kubernetes/main or kubernetes/storage). If you modify this line to dir_names_max_depth: 1 to only allow the changed-files action to traverse a single folder then it works; which kind of makes the changed-files worthless in our single node clusters; which seems you already figured out