fluent / fluent-operator

Operate Fluent Bit and Fluentd in the Kubernetes way - Previously known as FluentBit Operator
Apache License 2.0
555 stars 229 forks source link

bug: operator pod restarting due to controller-runtime.source errors #1188

Open jamonation opened 1 month ago

jamonation commented 1 month ago

Describe the issue

I'm running into an issue building a fluent-operator image using the Makefile, or using the go build ... invocation and packaging the resulting manager binary into an image. Either approach results in pod restarts when deploying the operator.

The errors look like the following:

kubectl logs -n fluent fluent-operator-55dd7bc945-kthgm

Example MultilineParser error using docker build image

2024-05-28T13:21:44Z    ERROR   controller-runtime.source   if kind is a CRD, it should be installed before calling Start   {"kind": "MultilineParser.fluentbit.fluent.io", "error": "no matches for kind \"MultilineParser\" in version \"fluentbit.fluent.io/v1alpha2\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/source/source.go:143
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
    /go/pkg/mod/k8s.io/apimachinery@v0.27.4/pkg/util/wait/wait.go:154
k8s.io/apimachinery/pkg/util/wait.waitForWithContext
    /go/pkg/mod/k8s.io/apimachinery@v0.27.4/pkg/util/wait/wait.go:207
k8s.io/apimachinery/pkg/util/wait.poll
    /go/pkg/mod/k8s.io/apimachinery@v0.27.4/pkg/util/wait/poll.go:260
k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
    /go/pkg/mod/k8s.io/apimachinery@v0.27.4/pkg/util/wait/poll.go:200
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
    /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.6/pkg/source/source.go:136

Example ClusterMultilineParser from go build... image

2024-05-28T14:22:17Z    INFO    Starting workers    {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "worker count": 1}
2024-05-28T14:22:17Z    ERROR   controller-runtime.source   if kind is a CRD, it should be installed before calling Start   {"kind": "ClusterMultilineParser.fluentbit.fluent.io", "error": "no matches for kind \"ClusterMultilineParser\" in version \"fluentbit.fluent.io/v1alpha2\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
    /var/cache/melange/gomodcache/sigs.k8s.io/controller-runtime@v0.14.6/pkg/source/source.go:143
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
    /var/cache/melange/gomodcache/k8s.io/apimachinery@v0.27.4/pkg/util/wait/wait.go:154
k8s.io/apimachinery/pkg/util/wait.poll
    /var/cache/melange/gomodcache/k8s.io/apimachinery@v0.27.4/pkg/util/wait/poll.go:245
k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
    /var/cache/melange/gomodcache/k8s.io/apimachinery@v0.27.4/pkg/util/wait/poll.go:200
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
    /var/cache/melange/gomodcache/sigs.k8s.io/controller-runtime@v0.14.6/pkg/source/source.go:136
2024-05-28T14:22:18Z    INFO    Starting workers    {"controller": "fluentd", "controllerGroup": "fluentd.fluent.io", "controllerKind": "Fluentd", "worker count": 1}
2024-05-28T14:22:18Z    INFO    Starting workers    {"controller": "collector", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "Collector", "worker count": 1}
2024-05-28T14:22:18Z    INFO    Starting workers    {"controller": "fluentd", "controllerGroup": "fluentd.fluent.io", "controllerKind": "Fluentd", "worker count": 1}

Both methods will show both Cluster/Multiline errors and the pod will restart. The operator will however create fluent-bit and fluentd pods given the appropriate manifest.

To Reproduce

build the image

cat VERSION
v2.8.0

docker build --platform linux/arm64 -f cmd/fluent-manager/Dockerfile . -t k3d-k3d.localhost:5005/fluent-operator:v2.8.0

docker push k3d-k3d.localhost:5005/fluent-operator:v2.8.0

helm install

helm repo add fluent https://fluent.github.io/helm-charts
helm install fluent-operator --create-namespace -n fluent fluent/fluent-operator  --set operator.container.repository=k3d-k3d.localhost:5005/fluent-operator --set operator.container.tag=v2.8.0

Expected behavior

The pod should not be restarting and throwing errors about CRDs.

Your Environment

How did you install fluent operator?

Using helm:

helm repo add fluent https://fluent.github.io/helm-charts
helm install fluent-operator --create-namespace -n fluent fluent/fluent-operator  --set operator.container.repository=k3d-k3d.localhost:5005/fluent-operator --set operator.container.tag=v2.8.0

Additional context

Running with the kubesphere/fluent-operator:v2.8.0 image explicitly, or by not setting the operator.container.repository value in the helm chart results in no errors.

benjaminhuo commented 1 month ago

@jamonation have you tried the latest image that will be built on every PR image

gp185175 commented 1 month ago

@jamonation @benjaminhuo I'm also getting the same error. The workaround to move from container: repository: "kubesphere/fluent-operator" tag: v2.1.0 tag: v2.8.0 Isn't working?

jamonation commented 1 month ago

I'll try latest and report back @benjaminhuo, thanks for the help!

elsnepal commented 2 weeks ago

This is issue for us too, cannot seem to use 2.8 or 2.9

I dont see being able to use disableComponentControllers: "fluentd" and make use of fluent-bit configurations. This results into

2024-05-28T13:21:44Z ERROR controller-runtime.source if kind is a CRD, it should be installed before calling Start ...... Also relative to: https://github.com/fluent/fluent-operator/blob/v2.9.0/charts/fluent-operator/templates/fluent-operator-deployment.yaml#L101-L107

The logic is totally changed from v2.7.0

https://github.com/fluent/fluent-operator/blob/v2.7.0/charts/fluent-operator/templates/fluent-operator-deployment.yaml#L102

2.9.0 -> putting as same arg 2.7.0 -> as seperate value for arg