SumoLogic / sumologic-otel-collector

Sumo Logic Distribution for OpenTelemetry Collector
Apache License 2.0
43 stars 39 forks source link

SumoLogic OTEL collector not honoring _sourceName #1238

Closed rishiraj-rana closed 1 year ago

rishiraj-rana commented 1 year ago

Expected Behavior

The metadata field called Source Name (_sourceName) contains the file path entered when you created your Source. If your Source points to more than one file path, then messages from each file path are tagged with the specific path from which they were collected.

Ref: https://help.sumologic.com/docs/send-data/reference-information/metadata-naming-conventions/#source-name

Observed Behavior

Logs are streams to sumologic with its _sourceName overwritten with _sourceName="OTC Log Input" not preserving the file path.

Steps to Reproduce

Install otelcol-sumo v0.83.0-sumo-0 and create config below as config.yaml. Once you have the collector running, check in the Sumologic app via the UI.

Environment to reproduce issue

# uname -a
Linux ansible-test-client.simonsfoundation.org 5.15.0-1036-aws #40~20.04.1-Ubuntu SMP Mon Apr 24 00:21:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

# cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian

otelcol-sumo config.yaml

exporters:
  sumologic:
    auth:
      authenticator: sumologic
extensions:
  file_storage:
    directory: /etc/sumologic/storage
  sumologic:
    clobber: true
    collector_name: testing.example.org
    force_registration: true
    install_token: <token>
    time_zone: America/New_York
processors:
  resource/apt:
    attributes:
    - action: insert
      key: _sourceCategory
      value: aws/linux/ansible/testing/apt
  resource/audit:
    attributes:
    - action: insert
      key: _sourceCategory
      value: aws/linux/ansible/testing/audit
  resource/cloud-init:
    attributes:
    - action: insert
      key: _sourceCategory
      value: aws/linux/ansible/testing/cloud-init
  resource/hostmetrics:
    attributes:
    - action: insert
      key: _sourceCategory
      value: aws/linux/metrics/ansible/testing
  resource/syslog:
    attributes:
    - action: insert
      key: _sourceCategory
      value: aws/linux/ansible/testing/syslog
receivers:
  filelog/apt:
    include:
    - /var/log/apt/history.log
    include_file_name: false
    include_file_path_resolved: true
    operators:
    - from: attributes['log.file.path_resolved']
      to: resource['log.file.path_resolved']
      type: move
    start_at: beginning
  filelog/audit:
    include:
    - /var/log/audit/audit.log
    include_file_name: false
    include_file_path_resolved: true
    operators:
    - from: attributes['log.file.path_resolved']
      to: resource['log.file.path_resolved']
      type: move
    start_at: beginning
  filelog/cloud-init:
    include:
    - /var/log/cloud-init-output.log
    include_file_name: false
    include_file_path_resolved: true
    operators:
    - from: attributes['log.file.path_resolved']
      to: resource['log.file.path_resolved']
      type: move
    start_at: beginning
  filelog/syslog:
    include:
    - /var/log/messages
    - /var/log/cron
    - /var/log/maillog
    - /var/log/secure
    - /var/log/boot.log
    - /var/log/syslog
    - /var/log/dpkg.log
    - /var/log/daemon.log
    - /var/log/kern.log
    - /var/log/mail.log
    - /var/log/auth.log
    - /var/log/cron.log
    include_file_name: false
    include_file_path_resolved: true
    operators:
    - from: attributes['log.file.path_resolved']
      to: resource['log.file.path_resolved']
      type: move
    start_at: beginning
  hostmetrics:
    collection_interval: 5m
    scrapers:
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
      load:
        cpu_average: true
      memory:
        metrics:
          system.memory.utilization:
            enabled: true
      network: null
      processes: null
  hostmetrics/disk:
    collection_interval: 5m
    scrapers:
      disk: null
      filesystem: null
service:
  extensions:
  - file_storage
  - sumologic
  pipelines:
    logs/apt:
      exporters:
      - sumologic
      processors:
      - resource/apt
      receivers:
      - filelog/apt
    logs/audit:
      exporters:
      - sumologic
      processors:
      - resource/audit
      receivers:
      - filelog/audit
    logs/cloud-init:
      exporters:
      - sumologic
      processors:
      - resource/cloud-init
      receivers:
      - filelog/cloud-init
    logs/syslog:
      exporters:
      - sumologic
      processors:
      - resource/syslog
      receivers:
      - filelog/syslog
    metrics:
      exporters:
      - sumologic
      processors:
      - resource/hostmetrics
      receivers:
      - hostmetrics
      - hostmetrics/disk
  telemetry:
    metrics:
      address: :61088
andrzej-stencel commented 1 year ago

Hi @rishiraj-rana, thanks for reporting your issue and presenting the full config. I think there's a misunderstanding here.

In the config you provided, the _sourceName field is not set anywhere. What the config is specifying is moving the log.file.path_resolved attribute from record level to resource level. I assume you want the _sourceName field to contain the value from the log.file.path_resolved field. This doesn't happen automatically, unless you use the Sumo Logic Schema processor, but it does a lot quite more than just that. Here's the config that should work for you. It moves record-level attribute log.file.path_resolved to resource-level attribute _sourceName:

exporters:
  sumologic:
    auth:
      authenticator: sumologic
extensions:
  file_storage:
    directory: /etc/sumologic/storage
  sumologic:
    clobber: true
    collector_name: testing.example.org
    force_registration: true
    install_token: <token>
    time_zone: America/New_York
processors:
  resource/apt:
    attributes:
    - action: insert
      key: _sourceCategory
      value: aws/linux/ansible/testing/apt
  resource/audit:
    attributes:
    - action: insert
      key: _sourceCategory
      value: aws/linux/ansible/testing/audit
  resource/cloud-init:
    attributes:
    - action: insert
      key: _sourceCategory
      value: aws/linux/ansible/testing/cloud-init
  resource/hostmetrics:
    attributes:
    - action: insert
      key: _sourceCategory
      value: aws/linux/metrics/ansible/testing
  resource/syslog:
    attributes:
    - action: insert
      key: _sourceCategory
      value: aws/linux/ansible/testing/syslog
receivers:
  filelog/apt:
    include:
    - /var/log/apt/history.log
    include_file_name: false
    include_file_path_resolved: true
    operators:
    - from: attributes['log.file.path_resolved']
      to: resource['_sourceName']
      type: move
    start_at: beginning
  filelog/audit:
    include:
    - /var/log/audit/audit.log
    include_file_name: false
    include_file_path_resolved: true
    operators:
    - from: attributes['log.file.path_resolved']
      to: resource['_sourceName']
      type: move
    start_at: beginning
  filelog/cloud-init:
    include:
    - /var/log/cloud-init-output.log
    include_file_name: false
    include_file_path_resolved: true
    operators:
    - from: attributes['log.file.path_resolved']
      to: resource['_sourceName']
      type: move
    start_at: beginning
  filelog/syslog:
    include:
    - /var/log/messages
    - /var/log/cron
    - /var/log/maillog
    - /var/log/secure
    - /var/log/boot.log
    - /var/log/syslog
    - /var/log/dpkg.log
    - /var/log/daemon.log
    - /var/log/kern.log
    - /var/log/mail.log
    - /var/log/auth.log
    - /var/log/cron.log
    include_file_name: false
    include_file_path_resolved: true
    operators:
    - from: attributes['log.file.path_resolved']
      to: resource['_sourceName']
      type: move
    start_at: beginning
  hostmetrics:
    collection_interval: 5m
    scrapers:
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
      load:
        cpu_average: true
      memory:
        metrics:
          system.memory.utilization:
            enabled: true
      network: null
      processes: null
  hostmetrics/disk:
    collection_interval: 5m
    scrapers:
      disk: null
      filesystem: null
service:
  extensions:
  - file_storage
  - sumologic
  pipelines:
    logs/apt:
      exporters:
      - sumologic
      processors:
      - resource/apt
      receivers:
      - filelog/apt
    logs/audit:
      exporters:
      - sumologic
      processors:
      - resource/audit
      receivers:
      - filelog/audit
    logs/cloud-init:
      exporters:
      - sumologic
      processors:
      - resource/cloud-init
      receivers:
      - filelog/cloud-init
    logs/syslog:
      exporters:
      - sumologic
      processors:
      - resource/syslog
      receivers:
      - filelog/syslog
    metrics:
      exporters:
      - sumologic
      processors:
      - resource/hostmetrics
      receivers:
      - hostmetrics
      - hostmetrics/disk
  telemetry:
    metrics:
      address: :61088
rishiraj-rana commented 1 year ago

Thank you @astencel-sumo! I really appreciate the help.