fluent / fluent-bit

Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX and Windows
https://fluentbit.io
Apache License 2.0
5.53k stars 1.51k forks source link

Send kubernetes logs only for specified namespace to Azure Log Analytic workspace #8586

Open murech opened 3 months ago

murech commented 3 months ago

I'm able to parse k8s logs for all namespace and send these to Azure Log Analytic Workspace with the ConfigMap below. However, I was not able to send only the logs for one specified namespace (for example: podinfo). Can you please tell me how I can filter namespaces?

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020

    @INCLUDE input-kubernetes.conf
    @INCLUDE filter-kubernetes.conf
    @INCLUDE output-azure.conf

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.*
        Path              /var/log/containers/*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  filter-kubernetes.conf: |
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Merge_Log           On
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On
        Buffer_size         0

  output-azure.conf: |
    [OUTPUT]
        Name        azure
        Match       *
        Customer_ID ${FLUENT_AZURE_WORKSPACE_ID_PODINFO}
        Shared_Key  ${FLUENT_AZURE_WORKSPACE_KEY_PODINFO}

  parsers.conf: |
    [PARSER]
        Name   apache
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache2
        Format regex
        Regex  ^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   apache_error
        Format regex
        Regex  ^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])?( \[client (?<client>[^\]]*)\])? (?<message>.*)$

    [PARSER]
        Name   nginx
        Format regex
        Regex ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        syslog
        Format      regex
        Regex       ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$
        Time_Key    time
        Time_Format %b %d %H:%M:%S

Log file for namespace podinfo:

/var/log/containers/podinfo-846d65b775-qb6nj_podinfo_podinfo-8f5b52ad7f87d7c574d0cde76055ca7294a23f8623a59f7db34ae7375e5cd099.log
kforeverisback commented 3 months ago

There are multiple ways to achieve this:

Your podname and namespace is embedded into the log file names. Since the tail input plugin supports tag expansion, you get the full log path (with podnames and namespace) in your tag.

See this document for more info and concrete example.

murech commented 3 months ago

Many thanks for your input, @kforeverisback. I was able to route namespaces with the following configuration:

  input-kubernetes.conf: |
    [INPUT]
        Name              tail
        Tag               kube.vault.*
        Path              /var/log/containers/*vault*.log
        Parser            docker
        DB                /var/log/flb_kube.db
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
        Refresh_Interval  10

  output-azure.conf: |
    [OUTPUT]
        Name        azure
        Match       kube.vault.*
        Customer_ID ${FLUENT_AZURE_WORKSPACE_ID_VAULT}
        Shared_Key  ${FLUENT_AZURE_WORKSPACE_KEY_VAULT}

I have now another problem when I try to write to a Log Analytics Workspace that is in another subscription than the AKS cluster. I'm getting the following error message:

[ warn] [output:azure:azure.0] http_status=403
2024-04-05T16:07:31.358868992Z [2024/04/05 16:07:31] [error] [engine] chunk '1-1712333240.111923211.flb' cannot be retried: task_id=0, input=tail.0 > output=azure.0

403 = permissions error. We are using managed identities for our AKS cluster. Do we have to give the managed identity access to the Log Analytics Workspace?

kforeverisback commented 3 months ago

@murech the Match configuration should not cause any issues with Azure access.

How were you sending the data before? Based on the config it looked like you were using SharedKey and Workspace ID.

Currently, FluentBit's Azure plugin doesn't support Managed Identity yet from Microsoft Entra ID (formerly Azure AD).

It uses SharedKey and Workspae ID, it shouldn't depend on tenants or subscriptions. See this Log Analytics REST API document.