fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.55k stars 607 forks source link

The flux logs command does not respect FLUX_SYSTEM_NAMESPACE #3120

Open kingdonb opened 2 years ago

kingdonb commented 2 years ago

Describe the bug

When Flux is installed from the Operator Hub installation instructions, it lands in the namespace operators by default.

The flux logs command accepts --flux-namespace which works, but FLUX_SYSTEM_NAMESPACE being exported in the shell does not have the same effect.

Steps to reproduce

On a new cluster (kind is OK, I used k3s and vcluster) you can follow the 2-step directions to install OLM, and install a Flux operator "subscription" which is located in the operators namespace.

$ curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.22.0/install.sh | bash -s v0.22.0
$ kubectl create -f https://operatorhub.io/install/flux.yaml

...

$ export FLUX_SYSTEM_NAMESPACE=operators

$ flux logs
(no output)

(Note that I have done this after the "expected behavior" steps below, so I am certain there are already some logs to capture.)

Expected behavior

Calling flux logs with the flux-namespace option gets the desired results:

$ flux logs --flux-namespace operators
2022-09-19T22:27:52.830Z info Kustomization/flux-system.flux-system - Source 'GitRepository/flux-system' not found
2022-09-19T22:28:10.023Z info Kustomization/flux-system.flux-system - server-side apply completed
2022-09-19T22:28:25.033Z error Kustomization/flux-system.flux-system - unable to record event POST http://notification-controller.flux-system.svc.cluster.local./ giving up after 5 attempt(s): Post "http://notification-controller.flux-system.svc.cluster.local./": dial tcp: lookup notification-controller.flux-system.svc.cluster.local. on 10.107.253.154:53: no such host
2022-09-19T22:28:25.063Z info Kustomization/flux-system.flux-system - Reconciliation finished in 15.459341189s, next run in 5m0s
2022-09-19T22:28:09.556Z error GitRepository/flux-system.flux-system - unable to record event POST http://notification-controller.flux-system.svc.cluster.local./ giving up after 5 attempt(s): Post "http://notification-controller.flux-system.svc.cluster.local./": dial tcp: lookup notification-controller.flux-system.svc.cluster.local. on 10.107.253.154:53: no such host
2022-09-19T22:28:09.556Z info GitRepository/flux-system.flux-system - stored artifact for commit 'add kustomization.yaml'

The content of the logs above is not important, the fact that they are successfully recovered is the expected behavior.

You should get the same results by setting FLUX_SYSTEM_NAMESPACE=operators and exporting it in the shell environment but it does not.

There is no output, unless you call it with the --flux-namespace option as above.

Screenshots and recordings

No response

OS / Distro

MacOS 10.15.7

Flux version

v0.34.0

Flux check

$ flux check
► checking prerequisites
✔ Kubernetes 1.21.14+k3s1 >=1.20.6-0
► checking controllers
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.26.3
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.22.2
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.24.1
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.23.5
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.19.2
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.25.10
► checking crds
✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1
✔ alerts.notification.toolkit.fluxcd.io/v1beta1
✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2
✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1
✔ receivers.notification.toolkit.fluxcd.io/v1beta1
✔ gitrepositories.source.toolkit.fluxcd.io/v1beta2
✔ providers.notification.toolkit.fluxcd.io/v1beta1
✔ buckets.source.toolkit.fluxcd.io/v1beta2
✔ imagerepositories.image.toolkit.fluxcd.io/v1beta1
✔ imagepolicies.image.toolkit.fluxcd.io/v1beta1
✔ kustomizations.kustomize.toolkit.fluxcd.io/v1beta2
✔ helmcharts.source.toolkit.fluxcd.io/v1beta2
✔ all checks passed

Git provider

N/A

Container Registry provider

N/A

Additional context

No response

Code of Conduct

somtochiama commented 2 years ago

Right now for flux logs, there are two namespace flags. --flux-namespace (where the flux system is installed) and --namespace flag (the namespace filter for the logs from objects).

When you configure FLUX_SYSTEM_NAMESPACE (affects only --namespace), it serves as the default namespace filter, giving you logs from only objects in the namespace. If we are to default both flags to the FLUX_SYSTEM_NAMESPACE, It means that it will look for the flux-system components in that namespace AND still filter logs by that namespace

When you do:

export FLUX_SYSTEM_NAMESPACE=operator
flux logs

You will still no logs in the output if you don't have any flux custom resources in the operator namespace. (the command, in this case, is getting the logs but filtering them out - maybe we should return an error if there are no pods).

@kingdonb is this the intended behaviour? I wanted to call it out before making any changes.

kingdonb commented 2 years ago

I think that FLUX_SYSTEM_NAMESPACE should only affect the namespace where flux is installed. In other words, from your description I think it was implemented backwards. (Either that, or it should affect both namespaces.)

I think there might be some workshopping to do in order to make sure this works well, based on your reply. I wouldn't expect anyone to put their Flux custom resources in the operators namespace, so the change I think I'm asking for will probably not fix everything without a bit more help.

Let's not change anything right away, as your reply helps clarify the issue for me 👍

kingdonb commented 2 years ago

I think it's OK to work towards a single variable that will work with flux bootstrap in a different namespace, rather than try to conform to how OperatorHub users might experience Flux if they use OLM and don't have OpenShift to manage the install so it winds up in the flux-system namespace.

Fundamentally that's probably 0.2% of users, if it even registers. I think that FLUX_SYSTEM_NAMESPACE should probably change both, so if someone's Flux installation was in the operators namespace and they put their flux-system there too, it would work for them. I'm not in favor adding a second variable because it seems unnecessarily complicated. But if we think about how people will use this in multi-tenant Flux environments, there might be some cases where my gotk-components are in flux-system and my workloads are in tenant-ns, and you'd want to be able to set up some environment variables so that flux logs returned relevant information for you. I don't know how you can get there without making it two variables.

kingdonb commented 2 years ago

Here's some better debugging information. If I set FLUX_SYSTEM_NAMESPACE=operators and run flux bootstrap, I should get reasonable behavior from flux logs after that.

It only works if I call it like: flux logs --flux-namespace operators

That seems broken. It's also something else interesting, I noticed that it doesn't just change the namespace, when you set FLUX_SYSTEM_NAMESPACE before bootstrap, it also changes the name of the gitrepository and kustomization to match

So I got a Kustomization called operators in my operators namespace, and a gitrepository called operators

I would have expected those to remain called flux-system – the secret is still called flux-system

$ export FLUX_SYSTEM_NAMESPACE=operators
$ flux bootstrap github   --interval 10s   --owner kingdonb --personal   --repository bootstrap-repo   --branch staging   --path=clusters/demo-cluster-1
Please enter your GitHub personal access token (PAT):
► connecting to github.com
► cloning branch "staging" from Git repository "https://github.com/kingdonb/bootstrap-repo.git"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ component manifests are up to date
✔ reconciled components
► determining if source secret "operators/flux-system" exists
✔ source secret up to date
► generating sync manifests
✔ generated sync manifests
✔ sync manifests are up to date
► applying sync manifests
✔ reconciled sync configuration
◎ waiting for Kustomization "operators/operators" to be reconciled
✔ Kustomization reconciled successfully
► confirming components are healthy
✔ helm-controller: deployment ready
✔ kustomize-controller: deployment ready
✔ notification-controller: deployment ready
✔ source-controller: deployment ready
✔ all components are healthy