fluent / fluent-operator

Operate Fluent Bit and Fluentd in the Kubernetes way - Previously known as FluentBit Operator
Apache License 2.0
585 stars 248 forks source link

help request: option to ignore not valid output cr instance #943

Open rmechi opened 1 year ago

rmechi commented 1 year ago

Describe the issue

apiVersion: fluentd.fluent.io/v1alpha1
kind: Output
metadata:
  namespace: namespace-a
  labels:
    xyz.developergitops.com/instance: namespace-a-logging
    output.fluentd.fluent.io/enabled: 'true'
    output.fluentd.fluent.io/scope: namespace
spec:
  outputs:
    - cloudWatch:
        webIdentityTokenFile: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
        logGroupName:namespace-a-group
        roleArn: arn:aws:iam::123456789:role/hub-role
        awsUseSts: true
        includeTimeKey: true
        autoCreateStream: true
        roleSessionName: namespace-a_session
        logStreamName: namespace-a-stream
        awsStsRoleArn: 'arn:aws:iam::987654321:role/spoke-role'
        region: us-east-1
        maxEventsPerBatch: '10000'
apiVersion: fluentd.fluent.io/v1alpha1
kind: Output
metadata:
  namespace: namespace-f
  labels:
    xyz.developergitops.com/instance: namespace-f-logging
    output.fluentd.fluent.io/enabled: 'true'
    output.fluentd.fluent.io/scope: namespace
spec:
  outputs:
    - cloudWatch:
        webIdentityTokenFile: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
        logGroupName:namespace-f-group
        roleArn: arn:aws:iam::757575757:role/hub-role
        awsUseSts: true
        includeTimeKey: true
        autoCreateStream: true
        roleSessionName: namespace-f_session
        logStreamName: namespace-f-stream
        awsStsRoleArn: 'arn:aws:iam::74658383:role/spoke-role'
        region: us-east-1
        maxEventsPerBatch: '10000'

Now, role mentioned under awsStsRoleArn not exist. when fluentd reloads -> fluent bit throuw TCP connection failed: fluentd.fluent.svc.cluster.local:24224 (Connection refused) -> fluentd throw something like _class=Aws::STS::Errors::AccessDenied error="User: arn:aws:sts::757575757:assumed-role/hub-role/NjVmNTJkNjUtYTFkMC00MzY1LWJhNjctN2M1MWZmMmY1Mjhl is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::74658383::role/spoke-role"

after creating Output for namespace-f with non existing role. fluentd started refusing connections impacting whole cluster i.e. all other namespaces those have valid configurations through Output object.

any way to ignore or not to process the output instances those have non woking configuration ? ex. role not exist or target not listening etc. because one non working output object impact thousands of other namespace logging those have valid Output configuration as i see one not working Output leading fluentd pod to refuse the connection.

How did you install fluent operator?

using helm chart. operator version 1.7.0

Additional context

No response

wenchajun commented 1 year ago

This one requires you to provide the contents of the secret configuration file.

rmechi commented 1 year ago

Thanks for the Replay.

copy pasting decoded data: of secret named fluentd-config:

app.conf:

<source>
  @type  forward
  bind  0.0.0.0
  port  24224
</source>
<match **>
  @id  main
  @type  label_router
  <route>
    @label  @a2170d34e9940ec56d328100e375c43e
    <match>
      namespaces  default,kube-system
    </match>
  </route>
  <route>
    @label  @4d51318d244a44490830fbdca9c13259
    <match>
      namespaces  paas-validation-d
    </match>
  </route>
  <route>
    @label  @b5c39d30d6c3efe8807681bdb8aa659f
    <match>
      namespaces  validate-spoke-logging
    </match>
  </route>
  <route>
    @label  @f5acde559c2527b8b6e5a0ed870c7b72
    <match>
      namespaces  loft-d
    </match>
  </route>
</match>
<label @a2170d34e9940ec56d328100e375c43e>
  <match **>
    @id  ClusterFluentdConfig-cluster-fluentd-config::cluster::clusteroutput::cloudwatch-hub-0
    @type  cloudwatch_logs
    auto_create_stream  true
    log_group_name  fluentd-log-group-hub
    log_stream_name  fluentd-log-stream-hub
    region  us-east-1
    <web_identity_credentials>
      role_arn  arn:aws:iam::1111111111111111:role/delegate-admin-fluent-operator-hub
      role_session_name  fluentdToCloudwatchHub
      web_identity_token_file  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    </web_identity_credentials>
  </match>
</label>
<label @4d51318d244a44490830fbdca9c13259>
  <match **>
    @id  FluentdConfig-paas-validation-d-default-config::paas-validation-d::output::default-output-0
    @type  cloudwatch_logs
    auto_create_stream  true
    aws_sts_role_arn  arn:aws:iam::123456789101:role/delegate-admin-fluent-spoke
    aws_use_sts  true
    include_time_key  true
    log_group_name  paas-validation-d-group
    log_stream_name  paas-validation-d-stream
    max_events_per_batch  10000
    region  us-east-1
    <web_identity_credentials>
      role_arn  arn:aws:iam::1111111111111111:role/delegate-admin-fluent-operator-hub
      role_session_name  paas-validation-d_session
      web_identity_token_file  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    </web_identity_credentials>
  </match>
</label>
<label @b5c39d30d6c3efe8807681bdb8aa659f>
  <match **>
    @id  FluentdConfig-validate-spoke-logging-default-config::validate-spoke-logging::output::default-output-0
    @type  cloudwatch_logs
    auto_create_stream  true
    aws_sts_role_arn  arn:aws:iam::8588858789101:role/delegate-admin-fluent-spoke
    aws_use_sts  true
    include_time_key  true
    log_group_name  validate-spoke-logging-group
    log_stream_name  validate-spoke-logging-stream
    max_events_per_batch  10000
    region  us-east-1
    <web_identity_credentials>
      role_arn  arn:aws:iam::1111111111111111:role/delegate-admin-fluent-operator-hub
      role_session_name  validate-spoke-logging_session
      web_identity_token_file  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    </web_identity_credentials>
  </match>
</label>
<label @f5acde559c2527b8b6e5a0ed870c7b72>
  <match **>
    @id  FluentdConfig-loft-d-default-config::loft-d::output::default-output-0
    @type  cloudwatch_logs
    auto_create_stream  true
    aws_sts_role_arn  arn:aws:iam::987654321010:role/delegate-admin-fluent-spoke
    aws_use_sts  true
    include_time_key  true
    log_group_name  loft-d-group
    log_stream_name  loft-d-stream
    max_events_per_batch  10000
    region  us-east-1
    <web_identity_credentials>
      role_arn  arn:aws:iam::1111111111111111:role/delegate-admin-fluent-operator-hub
      role_session_name  loft-d_session
      web_identity_token_file  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    </web_identity_credentials>
  </match>
</label>

fluent.conf:

# includes all files
@include /fluentd/etc/system.conf
@include /fluentd/etc/app.conf
@include /fluentd/etc/log.conf

log.conf:

# Do not collect fluentd's own logs to avoid infinite loops.
<match **>
    @type null
    @id main-no-output
</match>
<label @FLUENT_LOG>
    <match fluent.*>
        @type null
        @id main-fluentd-log
    </match>
</label>

system.conf:

# Enable RPC endpoint
<system>
    rpc_endpoint 127.0.0.1:24444
    log_level info
    workers 1
</system>
rmechi commented 1 year ago

if any one of below targets does not work, fluentd start refusing the connections.

4d51318d244a44490830fbdca9c13259
a2170d34e9940ec56d328100e375c43e
b5c39d30d6c3efe8807681bdb8aa659f
f5acde559c2527b8b6e5a0ed870c7b72