fluent / fluent-operator

Operate Fluent Bit and Fluentd in the Kubernetes way - Previously known as FluentBit Operator
Apache License 2.0
583 stars 248 forks source link

Can FluentBit CRDs be namespaced? #516

Open alternaivan opened 1 year ago

alternaivan commented 1 year ago

Describe the issue

Hello,

I'm trying to deploy ClusterOutput and ClusterParser to the specific namespace where the FluentBit Operator is deployed, the resources are being deployed, however, on the cluster level and without the namespace.

Is it possible to deploy namespaced CRDs or not? The documentation says that the CRDs are cluster level which means no I guess, however, graph below the documentation shows the namespace value. In case we cannot deploy the namespaced CRDs, the graph could be a bit misleading.

Documentation I'm referring is this.

Thanks in advance!

How did you install fluent operator?

Installation was done via Helm.

Additional context

No response

benjaminhuo commented 1 year ago

@alternaivan FluentBit is namespaced CRD which controls the namespaced Fluent Bit Daemonset. ClusterInput, ClusterParser, ClusterFilter, and ClusterOutput are all cluster-wide CRDs, this is because FluentBit is acting as a global agent to collect logs on each K8s node which requires cluster-wide privileges.

We'll remove the namespace in the following graph: https://github.com/fluent/fluent-operator/blob/master/docs/images/fluent-bit-operator-workflow.svg

adiforluls commented 1 year ago

We now have namespaced FluentBit CRDs in the operator starting v2.2.0.

alternaivan commented 1 year ago

Hi @adiforluls,

Thanks for the update. I've tested it on my local cluster, and although the CRDs such as Output are now namespaced it seems that the kind: Output is not working. I've tested it in parallel with both kind: ClusterOutput and kind: Output, the former is working, but the latter isn't. Am I missing something?

Below is both Output and ClusterOutput definitions.

apiVersion: fluentbit.fluent.io/v1alpha2
kind: Output
metadata:
  labels:
    fluentbit.fluent.io/enabled: "true"
  name: es
  namespace: fluent
spec:
  es:
    bufferSize: 25M
    generateID: true
    host: "elasticsearch"
    index: fluent-bit-index
    logstashFormat: false
    port: 9200
    replaceDots: true
    timeKey: '@timestamp'
  matchRegex: (?:service)\.(.*)

...

apiVersion: fluentbit.fluent.io/v1alpha2
kind: ClusterOutput
metadata:
  labels:
    fluentbit.fluent.io/enabled: "true"
  name: es
spec:
  es:
    bufferSize: 25M
    generateID: true
    host: "elasticsearch"
    index: fluent-bit-index
    logstashFormat: false
    port: 9200
    replaceDots: true
    timeKey: '@timestamp'
  matchRegex: (?:service)\.(.*)

Thanks, Marjan

alternaivan commented 1 year ago

Hi @adiforluls,

There was an error in my configuration. I was missing the FluentBitConfig on a namespace-level that matches the labels defined in the Output.

Sorry for the misunderstanding.

Thanks, Marjan

nemcikjan commented 1 year ago

@adiforluls how do the namespaced FluentBit resources actually work? If I create a FluentBit resource in a namespace, shouldn't fluentbit pods be deployed? Currently I'm facing an issues that the pods get created only in the same namespace where operator is running. If there is some configuration thing, that is required to run the deamonset in another namespace please let me know.

adiforluls commented 1 year ago

The FluentBit resource is a namespaced resource that dictates various configurations of the fluent-bit daemonset. The daemonset instances on every node of the cluster will be created in the same namespace as the FluentBit custom resource. This resource doesn't offer any namespace level log isolation/treatment.

There is a FluentBitConfig resource where you can specify label selector values for namespaced Filter/Parser/Output in the same namespace as FluentBitConfig resource.

FluentBit resource has a label selector field called namespaceFluentBitCfgSelector to match FluentBitConfig resources with the respective label from various namespace in the clusters.

mkocaks commented 1 year ago

Hi, Apologies in advance if I missed something obvious but we have been running fluent bit operator and fluent bit via helm 2.0.0 (fluent 1.8.3) fine (collecting specific entries from var/messages) However I have stepped through the upgrade to find out if the namespaced CRD's have introduced breaking changes and its not mentioned anywhere any pre-reqs before upgrading to 2.2 and above...

image

This CRD is new in 2.2 image

wenchajun commented 1 year ago

The operator is complaining it cannot find CRD FluentBitConfig ? We have only ever used CLusterFluentBitConfig.... Please can anyone advise what needs to be implemented configured prior to upgrading to 2.2.0 to ensure a working fluent bit operator....

You can open a new issue to discuss. You can get the crd on the server and observe if that crd exists.

adiforluls commented 1 year ago

Could you also share your values.yaml file

mkocaks commented 1 year ago

Thank you here is our values - just simple - we deploy operator via helm terraform operator and the others (input, output, parsr, config for fluent bit) is done via terraform kubectl manifest provider...

Omitted some entries with xxxxx

containerRuntime: containerd Kubernetes: false

operator: resources: limits: cpu: 200m memory: 200Mi requests: cpu: 100m memory: 60Mi

Below is the yaml for fluentbit:

apiVersion: fluentbit.fluent.io/v1alpha2 kind: FluentBit metadata: name: fluent-bit labels: app.kubernetes.io/name: fluent-bit spec: labels: app.kubernetes.io/name: fluent-bit image: kubesphere/fluent-bit:v${version} positionDB: hostPath: path: /var/lib/fluent-bit/ resources: requests: cpu: 10m memory: 25Mi

limits:

#  cpu: 500m
#  memory: 200Mi

fluentBitConfigName: fluentbit-config tolerations:

mkocaks commented 1 year ago

The above was working fine up until upgrading to 2.2 as mentioned above

mkocaks commented 1 year ago

Hi Any other advise here - there is breaking changes in 2.2.0 and above... Which ends up in the operator throwing this error and looks liek a bug in the CRD

if kind is a CRD, it should be installed before calling Start {"kind": "FluentBitConfig.fluentbit.fluent.io", "error": "no matches for kind \"FluentBitConfig\" in version \"fluentbit.fluent.io/v1alpha2\""} sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1

Could not wait for Cache to sync {"controller": "fluentbit", "controllerGroup": "fluentbit.fluent.io", "controllerKind": "FluentBit", "error": "failed to wait for fluentbit caches to sync: timed out waiting for cache to be synced"

wenchajun commented 1 year ago

Hi, Apologies in advance if I missed something obvious but we have been running fluent bit operator and fluent bit via helm 2.0.0 (fluent 1.8.3) fine (collecting specific entries from var/messages) However I have stepped through the upgrade to find out if the namespaced CRD's have introduced breaking changes and its not mentioned anywhere any pre-reqs before upgrading to 2.2 and above..

It seems to me that it may be because of the addition of this feature in version 2.2. https://github.com/fluent/fluent-operator/pull/621

mkocaks commented 1 year ago

It is! As we only use cluster fluent bit config - thus will try and update the CRD accouring to this manual https://github.com/fluent/fluent-operator#deploy-fluent-operator-with-helm I will feedback...

mkocaks commented 1 year ago

Thank you that was the issue...had to replace all CRD's !

adiforluls commented 1 year ago

@wenchajun @benjaminhuo looks like the enable crds feature has broken upgrade of fluent-operator. I too experienced this, simple helm upgrade does not update the CRDs anymore, helm install has no problems though (i.e. CRDs are applied).

benjaminhuo commented 1 year ago

@wenchajun @benjaminhuo looks like the enable crds feature has broken upgrade of fluent-operator. I too experienced this, simple helm upgrade does not update the CRDs anymore, helm install has no problems though (i.e. CRDs are applied).

@Kristian-ZH add this great enhancement in https://github.com/fluent/fluent-operator/pull/621. It looks like we need some adjustments here.

mkocaks commented 1 year ago

Workaround at the moment is just to manually update CRDs I guess (worked for me)