fluent / fluent-operator

Operate Fluent Bit and Fluentd in the Kubernetes way - Previously known as FluentBit Operator
Apache License 2.0
555 stars 229 forks source link

bug: Arg --disable-component-controllers still not working properly #1212

Open misoknr opened 1 week ago

misoknr commented 1 week ago

Describe the issue

When helm value for "operator.disableComponentControllers" is provided when installing fluent-operator, it fails to properly pass it to operator runtime. Following error will appear in operator log:

2024-06-18T15:20:52Z ERROR setup {"error": "incorrect value for -disable-component-controllers and it will not be proceeded (possible values are: fluent-bit, fluentd)"}

To Reproduce

Install/upgrade fluent-operator and provide following value in value file for example

operator:
  disableComponentControllers: "fluentd"

Expected behavior

When correct value is provided for operator.disableComponentControllers, it is correctly propagated to operator

Your Environment

- Fluent Operator version: 2.9.0
- Container Runtime: 
- Operating system: centos rhel fedora
- Kernel version: 5.10.214-202.855.amzn2.x86_64

How did you install fluent operator?

Via helm chart:

helm upgrade --install fluent-operator fluent/fluent-operator --version 2.9.0  --namespace fluent --create-namespace -f k8s_addons/fluent/values.yml  --set fluentbit.image.tag=v3.0.7

Values file contents:

containerRuntime: docker
Kubernetes: false
fluentd:
  crdsEnable: false
fluentbit:
  enable: true
  image:
    repository: ***
  imagePullSecrets:
  - name: artifactory-connection-docker-secret
operator:
  disableComponentControllers: "fluentd"
  container:
    repository: ***
  resources:
    requests:
      cpu: 100m
      memory: 450Mi
    limits:
      cpu: 100m
      memory: 450Mi
  imagePullSecrets:
  - name: artifactory-connection-docker-secret
  initcontainer:
    repository: ***

Additional context

No response

cw-Guo commented 1 week ago

I tried but can't reproduce it in my local environment. Some debug suggestions:

  1. Can you try to check the manifests generated by helm?
  2. Can you check the running fluent-operator pod's manifests?

The correct ones should have the following: ` args:

misoknr commented 1 week ago

This is in operator deployment manifest:

containers:
  - name: fluent-operator
    image: >-
      srsng-docker.artifactory.healthcare.siemens.com/kubesphere/fluent-operator:v2.9.0
    args:
      - '--disable-component-controllers="fluentd"'
benjaminhuo commented 1 week ago
      - '--disable-component-controllers="fluentd"'

Try - '--disable-component-controllers=fluentd'

misoknr commented 1 week ago
      - '--disable-component-controllers="fluentd"'

Try - '--disable-component-controllers=fluentd'

Thanks, but that's not the point. The point is that probably the helm template may be wrong and it will construct final helm chart with wrong value.

No matter if I set this

operator:
  disableComponentControllers: fluentd

or this

operator:
  disableComponentControllers: "fluentd"

to my values file, the result is still the same

SvenThies commented 1 week ago

Hey,

same issue here. The following appears in the deployment manifest:

containers:
      - args:
        - --disable-component-controllers="fluentd
cw-Guo commented 1 week ago

I did see a recent change about this feature. see templates/fluent-operator-deployment.yaml

Can you please check whether your template is the same with the latest one?

SvenThies commented 6 days ago

Hey,

thanks for the swift reply. From my helm release version, the fix you mentioned should already be there.

helm-release version: 2.9.0

values.yaml:

disableComponentControllers: "fluentd"

Rendered manifest:

args:
    - --disable-component-controllers="fluentd"

Error:

2024-06-24T20:28:46Z    ERROR   setup       {"error": "incorrect value for `-disable-component-controllers` and it will not be proceeded (possible values are: fluent-bit, fluentd)"}

Using your suggestion, patching the deployment manifest with

args: - "--disable-component-controllers=fluentd"

works fine.

cw-Guo commented 5 days ago
~/playground/fluent-operator master* 9s ❯ helm version                                                                                                                 23:23:53
version.BuildInfo{Version:"v3.15.2", GitCommit:"1a500d5625419a524fdae4b33de351cc4f58ec35", GitTreeState:"clean", GoVersion:"go1.22.4"}
~/playground/fluent-operator master* ❯ helm template fluent-operator charts/fluent-operator --version 2.9.0 --namespace fluent -f charts/fluent-operator/values.yaml | grep args -A 5 -B 5
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
        args:
          - "--disable-component-controllers=fluentd"
        volumeMounts:
        - name: env
          mountPath: /fluent-operator
      serviceAccountName: fluent-operator

I just tried again with the latest version helm, generated manifest is correct.

SvenThies commented 5 days ago

Hmm, that's weird.

Especially because the mentioned fix was released with 2.9.0 but is not in the deployment template of the artifacthub.com 2.9.0 chart, which still shows:

        args:
          - --disable-component-controllers={{ .Values.operator.disableComponentControllers | quote }}

Any idea where this comes from?

mritunjaysharma394 commented 3 days ago

I am facing the same issue @SvenThies but in my case the quotes seem to be fine but empty:

kubectl get deployment.apps/fluent-operator -n fluent -o yaml | grep args -A 5 -B 5                        
      labels:
        app.kubernetes.io/component: operator
        app.kubernetes.io/name: fluent-operator
    spec:
      containers:
      - args:
        - --disable-component-controllers=""
        env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:

Is this expected, should it not be fluentd @cw-Guo ?

Although except for this error, I think my rest of the logs of operator seem to look fine:

kubectl logs -n fluent pod/fluent-operator-58ff575d4c-twg7w
Defaulted container "fluent-operator" out of: fluent-operator, setenv (init)
2024-06-28T13:03:00Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": ":8080"}
2024-06-28T13:03:00Z    ERROR   setup           {"error": "incorrect value for `-disable-component-controllers` and it will not be proceeded (possible values are: fluent-bit, fluentd)"}
main.main
        /workspace/main.go:121
runtime.main
        /usr/local/go/src/runtime/proc.go:267
2024-06-28T13:03:00Z    INFO    setup   starting manager
2024-06-28T13:03:00Z    INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
2024-06-28T13:03:00Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.FluentBit"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.FluentBit"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1.Secret"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1.ServiceAccount"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.ClusterFluentBitConfig"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1.DaemonSet"}
2024-06-28T13:03:00Z    INFO    Starting Controller     {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.FluentBitConfig"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.ClusterInput"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.ClusterFilter"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.ClusterOutput"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.ClusterParser"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.ClusterMultilineParser"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.Filter"}
2024-06-28T13:03:00Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.Output"}
mritunjaysharma394 commented 3 days ago

Also an update to it, if I do:

helm upgrade fluent-operator fluent/fluent-operator --version 2.9.0 --namespace fluent --set operator.disableComponentControllers="fluent-bit"

I do get the updated deployment:

kubectl get deployment.apps/fluent-operator -n fluent -o yaml | grep args -A 5 -B 5
      labels:
        app.kubernetes.io/component: operator
        app.kubernetes.io/name: fluent-operator
    spec:
      containers:
      - args:
        - --disable-component-controllers="fluent-bit"
        env:
        - name: NAMESPACE
          valueFrom:
            fieldRef:

However, the error still surfaces in logs:

kubectl logs -n fluent pod/fluent-operator-85f648d4cb-j7hp8      
Defaulted container "fluent-operator" out of: fluent-operator, setenv (init)
2024-06-28T14:28:50Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": ":8080"}
2024-06-28T14:28:50Z    ERROR   setup           {"error": "incorrect value for `-disable-component-controllers` and it will not be proceeded (possible values are: fluent-bit, fluentd)"}
main.main
        /workspace/main.go:121
runtime.main
        /usr/local/go/src/runtime/proc.go:267
2024-06-28T14:28:50Z    INFO    setup   starting manager
2024-06-28T14:28:50Z    INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
2024-06-28T14:28:50Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-06-28T14:28:50Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.FluentBit"}
2024-06-28T14:28:50Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1.ServiceAccount"}
2024-06-28T14:28:50Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1.DaemonSet"}
2024-06-28T14:28:50Z    INFO    Starting Controller     {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit"}
2024-06-28T14:28:50Z    INFO    Starting EventSource    {"controller": "fluentd", "controllerGroup": "fluentd.fluent.io", "controllerKind": "Fluentd", "source": "kind source: *v1alpha1.Fluentd"}
2024-06-28T14:28:50Z    INFO    Starting EventSource    {"controller": "fluentd", "controllerGroup": "fluentd.fluent.io", "controllerKind": "Fluentd", "source": "kind source: *v1.ServiceAccount"}
2024-06-28T14:28:50Z    INFO    Starting EventSource    {"controller": "fluentd", "controllerGroup": "fluentd.fluent.io", "controllerKind": "Fluentd", "source": "kind source: *v1.DaemonSet"}
2024-06-28T14:28:50Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.FluentBit"}
2024-06-28T14:28:50Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1.Secret"}
2024-06-28T14:28:50Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.ClusterFluentBitConfig"}
2024-06-28T14:28:50Z    INFO    Starting EventSource    {"controller": "fluen
SvenThies commented 3 days ago

I think there is a problem with the latest release (v2.9.0) to the helm registry. As mentioned by @cw-Guo, the argument should look like this in the deployment template:

  - '--disable-component-controllers=fluentd'

As I saw, there was some problems with the release of v2.8.0.

mritunjaysharma394 commented 3 days ago

I think I have identified the problem and trying to work on a fix too. it seems like the value from helm is being parsed incorrectly: I added a small change in code and built a custom image to test with chart, while the manager binary itself worked fine without using chart but on parsing the value with chart, I got this logged:

2024-06-28T15:10:37Z    INFO    setup   Value of disabledControllers    {"value": "\"fluentd\""}
2024-06-28T15:10:37Z    ERROR   setup           {"error": "incorrect value for `-disable-component-controllers` and it will not be proceeded (possible values are: fluent-bit, fluentd)"}
main.main
        /workspace/main.go:122
runtime.main

Which is the reason why it is reporting it, it reads it as "\"fluentd\""

mritunjaysharma394 commented 3 days ago

Created a fix https://github.com/fluent/fluent-operator/pull/1222 and it works fine with helm install locally now

kubectl logs -n fluent pod/fluent-operator-7df5b4d96b-scphc            ✔  kind-kind ⎈ 
Defaulted container "fluent-operator" out of: fluent-operator, setenv (init)
2024-06-28T15:45:05Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": ":8080"}
2024-06-28T15:45:05Z    INFO    setup   starting manager
2024-06-28T15:45:05Z    INFO    Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}
2024-06-28T15:45:05Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-06-28T15:45:05Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.FluentBit"}
2024-06-28T15:45:05Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1.ServiceAccount"}
2024-06-28T15:45:05Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1.DaemonSet"}
2024-06-28T15:45:05Z    INFO    Starting Controller     {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit"}
2024-06-28T15:45:05Z    INFO    Starting EventSource    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "source": "kind source: *v1alpha2.FluentBit"}
2024-06-28T15:45:05Z    INFO    Starting EventSource    {"controller": "fluentd", "controllerGroup": "fluentd.fluent.io", "controllerKind": "Fluentd", "source": "kind source: *v1alpha1.Fluentd"}
SvenThies commented 2 days ago

IMHO fixing this in the code base doesn't make any sense. The chart from the repo (main and tag v2.9.0) works just fine. We need to fix the release.

cw-Guo commented 2 days ago

i do think this is a release issue, but i am not familiar with the release process. @benjaminhuo can you please help take a look at the helm release v2.9? Thanks!