fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.47k stars 598 forks source link

`flux logs` not compatible with OLM `flux` operator #1673

Closed kingdonb closed 3 years ago

kingdonb commented 3 years ago

Describe the bug

The flux logs command does not work with OperatorHub version of Flux. This is true even when the operator is properly installed in flux-system namespace, I had other issues because operators is the namespace used by default, so my first guess was --flux-namespace option was needed, but that didn't help.

I checked the code for logs to find out why this is happening, it starts by looking for deployments to pull logs from: https://github.com/fluxcd/flux2/blob/eba6706f158f01f5fbbd78c1f6003793cf14cfa9/cmd/flux/logs.go#L85

The OLM / OperatorHub version of Flux Operator does not include the label: app.kubernetes.io/instance so that seems pretty clearcut to explain why it doesn't work.

I am unsure how to suggest fixing this, should the operator include this label? OLM can manage several instances of Flux, if I understand the point of OLM, they can be at different versions. Or should we adjust flux logs to search for deployments in a different way which is compatible with the OLM installation?

The labels on the deployment include these:

  labels:
    olm.deployment-spec-hash: 5c69bb97f5
    olm.owner: flux.v0.16.1
    olm.owner.kind: ClusterServiceVersion
    olm.owner.namespace: flux-system
    operators.coreos.com/flux.flux-system: ""

The pod spec labels section has this one only (the label varies by app):

      labels:
        app: kustomize-controller

As indicated below in flux check output, the same problem affects us there.

Steps to reproduce

Install the current version of Operator Lifecycle Manager as documented at OperatorHub (click the Install button to find this instruction), only if needed (this is not needed for OpenShift users, but is needed on vanilla Kubernetes instances):

curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.18.3/install.sh | bash -s v0.18.3

Install the krew operator plugin, or use another method like OpenShift Console to install Flux in the flux-system namespace:

kubectl krew install operator

Create the flux-system namespace if needed, and use kubectl-operator to install Flux (this ensures the service accounts are properly bound through the ClusterRoleBinding, otherwise source-controller fails leader election):

kubectl create namespace flux-system
kubectl operator install -n flux-system flux --create-operator-group

Make any flux resources to ensure that some logs are generated, a GitRepo and Kustomization for example, and use flux logs to try to review the logs:

flux logs

(There is no output of flux logs)

Expected behavior

You should see the logs, but no output is returned.

Screenshots and recordings

No response

OS / Distro

MacOS 10.15.7

Flux version

Flux (Operator) v0.16.1

Flux check

$ flux check ► checking prerequisites ✔ kubectl 1.22.0-rc.0 >=1.18.0-0 ✔ Kubernetes 1.22.0-rc.0 >=1.16.0-0 ► checking controllers ✔ all checks passed

(the flux check output indicates the same problem exists there)

Git provider

N/A

Container Registry provider

N/A

Additional context

No response

Code of Conduct

kingdonb commented 3 years ago

cc: @chanwit FYI – there are probably at least 3 separate issues here, I may not have identified all the places in flux this affects.

We could add documentation to better support users who need to know about the ClusterRoleBinding problem, when flux-system namespace isn't used, I opened this issue at kubectl-operator related to that:

The front-page "Easy Install" button instructions on OLM's Operator Hub webpage do not suggest kubectl-operator, do not mention OpenShift Console either, so users who try installing Flux this way for the first time are likely to be stumped when source-controller fails to perform leader election, and the Flux operator fails to become ready.

This might be a small minority of users, as I think we've discussed before most OLM users are using OpenShift. But the flux logs and flux check issues I think are definitely still going to affect them.

stefanprodan commented 3 years ago

@kingdonb I think this is an easy fix: add the app.kubernetes.io/instance label with the value that OLM uses to https://github.com/weaveworks/flux2-openshift/blob/main/kustomization.yaml#L9

kingdonb commented 3 years ago

That would be a great idea, except I don't think the label is missing yet at the point of time where that kustomize build is invoked. It extracts gotk-components.yaml from the Flux CLI then some stuff happens in the background via operator-sdk and I think whatever labels might have been present are automatically discarded and replaced.

I'll move this discussion over to flux2-openshift, we can discuss how to resolve it there.

chanwit commented 3 years ago

I tweaked the release steps to include labels to the deployments in v0.16.2. @kingdonb could you test v0.16.2 from OperatorHub please?

kingdonb commented 3 years ago

This issue is fixed in Flux Operator v0.16.2 🎉

Same for flux check, both commands work as expected. Thank you @chanwit