DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.8k stars 1.18k forks source link

High volume JSON-formatted logs collected by agent are truncated in DataDog #5764

Open ian-axelrod opened 4 years ago

ian-axelrod commented 4 years ago

Output of the info page (if this is a bug)


===============
Agent (v6.17.1)
===============

  Status date: 2020-06-18 19:20:18.561964 UTC
  Agent start: 2020-06-12 15:31:00.513581 UTC
  Pid: 350
  Go Version: go1.12.9
  Python Version: 2.7.17
  Build arch: amd64
  Check Runners: 6
  Log Level: INFO

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    System UTC time: 2020-06-18 19:20:18.561964 UTC

  Host Info
  =========
    bootTime: 2020-02-10 23:02:23.000000 UTC
    kernelVersion: 4.14.158-129.185.amzn2.x86_64
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: bullseye/sid
    procs: 70
    uptime: 2944h28m39s

  Hostnames
  =========
    ec2-hostname: **REDACTED**
    hostname: **REDACTED**
    instance-id: **REDACTED**
    socket-fqdn: **REDACTED**
    socket-hostname: **REDACTED**
    host tags:
      env:stg
    hostname provider: configuration

  Metadata
  ========
    cloud_provider: AWS
    hostname_source: configuration

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 6, Total: 212,856
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    disk (2.6.0)
    ------------
      Instance ID: disk:e5dffb8bef24336f [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 196, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 106ms

    docker
    ------
      Instance ID: docker [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/docker.yaml
      Total Runs: 35,478
      Metric Samples: Last Run: 165, Total: 1 M
      Events: Last Run: 0, Total: 1,067
      Service Checks: Last Run: 1, Total: 36,091
      Average Execution Time : 67ms

    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 5, Total: 177,385
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 78, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 6, Total: 212,862
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 17, Total: 603,109
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    network (1.14.0)
    ----------------
      Instance ID: network:e0204ad63d43c949 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 37, Total: 1 M
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 1ms

    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 592
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 592
      Average Execution Time : 20.014s

    tm_docker_thinpool (1.0.0)
    --------------------------
      Instance ID: tm_docker_thinpool:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/tm_docker_thinpool.yaml
      Total Runs: 35,477
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    tm_kafka-connect (1.0.2)
    ------------------------
      Instance ID: tm_kafka-connect:e69530c8133401db [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/tm_kafka-connect.yaml
      Total Runs: 35,437
      Metric Samples: Last Run: 102, Total: 1 M
      Events: Last Run: 0, Total: 139
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 642ms

    tm_kafka_consumer_lag (1.0.0)
    -----------------------------
      Instance ID: tm_kafka_consumer_lag:e69530c8133401db [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/tm_kafka_consumer_lag.yaml
      Total Runs: 35,477
      Metric Samples: Last Run: 138, Total: 1 M
      Events: Last Run: 0, Total: 193
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 554ms

    tm_rqworkers (1.0.2)
    --------------------
      Instance ID: tm_rqworkers:dccdd5fa6ffb8a61 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/tm_rqworkers.yaml
      Total Runs: 35,477
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 35,477
      Metric Samples: Last Run: 1, Total: 35,477
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s

    zk (2.4.0)
    ----------
      Instance ID: zk:1f25e659fc9a1d9 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/zk.d/auto_conf.yaml
      Total Runs: 4,243
      Metric Samples: Last Run: 30, Total: 127,236
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 4,243
      Average Execution Time : 7ms
      metadata:
        version.major: 3
        version.minor: 4
        version.patch: 14
        version.raw: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf
        version.release: 4
        version.scheme: semver

  Python 3 Linter Warnings
  =======================
      tm_kafka-connect
      ----------------
        tm_kafka-connect.py:116:8 : print statement used (print-statement, E1601)
        tm_kafka-connect.py:119:12 : print statement used (print-statement, E1601)
        tm_kafka-connect.py:120:8 : print statement used (print-statement, E1601)

      tm_kafka_consumer_lag
      ---------------------
        tm_kafka_consumer_lag.py:127:8 : print statement used (print-statement, E1601)
        tm_kafka_consumer_lag.py:130:12 : print statement used (print-statement, E1601)
        tm_kafka_consumer_lag.py:131:8 : print statement used (print-statement, E1601)

      tm_pagespeed
      ------------
        tm_pagespeed.py:80:8 : print statement used (print-statement, E1601)
        tm_pagespeed.py:83:12 : print statement used (print-statement, E1601)
        tm_pagespeed.py:84:8 : print statement used (print-statement, E1601)

      tm_rqworkers
      ------------
        tm_rqworkers.py:59:8 : print statement used (print-statement, E1601)
        tm_rqworkers.py:62:12 : print statement used (print-statement, E1601)
        tm_rqworkers.py:63:8 : print statement used (print-statement, E1601)

  Loading Errors
  ==============
    tm_craft
    --------
      Core Check Loader:
        Check tm_craft not found in Catalog

      JMX Check Loader:
        check is not a jmx check, or unable to determine if it's so

      Python Check Loader:
        unable to import module 'tm_craft': No module named tm_craft

    tm_data_etl_pipeline
    --------------------
      Core Check Loader:
        Check tm_data_etl_pipeline not found in Catalog

      JMX Check Loader:
        check is not a jmx check, or unable to determine if it's so

      Python Check Loader:
        unable to import module 'tm_data_etl_pipeline': invalid syntax (tm_data_etl_pipeline.py, line 52)

    tm_zk
    -----
      Core Check Loader:
        Check tm_zk not found in Catalog

      JMX Check Loader:
        check is not a jmx check, or unable to determine if it's so

      Python Check Loader:
        unable to import module 'tm_zk': No module named tm_zk

========
JMXFetch
========

  Initialized checks
  ==================
    kafka
      instance_name : **REDACTED**
      message : 
      metric_count : 110
      service_check_count : 0
      status : OK
      instance_name : **REDACTED**
      message : 
      metric_count : 260
      service_check_count : 0
      status : OK
  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    CheckRunsV1: 35,477
    Dropped: 0
    DroppedOnInput: 0
    Events: 0
    HostMetadata: 0
    IntakeV1: 4,988
    Metadata: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Series: 0
    ServiceChecks: 0
    SketchSeries: 0
    Success: 75,942
    TimeseriesV1: 35,477

  API Keys status
  ===============
    API key ending with **REDACTED**: API Key valid

==========
Endpoints
==========
  https://app.datadoghq.com - API Key ending with:
      - **REDACTED**

==========
Logs Agent
==========
    Sending uncompressed logs in SSL encrypted TCP to agent-intake.logs.datadoghq.com on port 10516
    BytesSent: 4.6498322631e+10
    EncodedBytesSent: 4.6498322631e+10
    LogsProcessed: 4.3108137e+07
    LogsSent: 4.3108137e+07

  docker
  ------
    Type: docker
    Status: OK
    Inputs: b69e4e8c2066af5bc1e778a931ec2f7b8b94775fead5b066ef611e3e2a220713 
    Type: docker
    Status: OK
    Inputs: 735042e629cdc59ff49156cb8c6039bb416dae71cebdd3a7fa611e001f2e7ee6 
    Type: docker
    Status: OK
    Inputs: 3998c92f3f32a3536a8d8d143e9e81b8d1febd3b5f41f28c14c178b876e69785 
    Type: docker
    Status: OK
    Inputs: 490cb6968cb9b573085822bbff9d2e1c39e0027f24d8bcfc305ee7fb39d14b26 

  container_collect_all
  ---------------------
    Type: docker
    Status: OK
    Inputs: 1a9fe7ac6d8a0e7e4ad5bdb8b9d79bc2e239c256d432c2240f5f635ae3e1336e f1c98a998a2325766f95a785d6ae382159882d6d02bfe4bfb003233b77d4891e 

  tm_zk
  -----
    Type: file
    Path: /etc/zookeeper/logs/zookeeper.log
    Status: OK
    Inputs: /etc/zookeeper/logs/zookeeper.log 

=========
Aggregator
=========
  Checks Metric Sample: 29.6 M
  Dogstatsd Metric Sample: 9.3 M
  Event: 1,400
  Events Flushed: 1,400
  Number Of Flushes: 35,477
  Series Flushed: 31.3 M
  Service Check: 630,639
  Service Checks Flushed: 666,112

=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 9.3 M
  Metric Parse Errors: 0
  Service Check Packets: 60,254
  Service Check Parse Errors: 0
  Udp Bytes: 4.2 G
  Udp Packet Reading Errors: 0
  Udp Packets: 4.5 M
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0

Describe what happened: I recently converted The Muse's Kafka Connect service log output from standard log4j console output to JSON formatted output using logback. I confirmed this works for other services, such as Kafka Brokers. With Kafka Connect, this works fine for INFO level logs and above, but I find that a majority of DEBUG logs end up with the message 'TRUNCATED' preceding the log data; and I find that the log data itself is the concatenation of dozens of unconverted JSON log records. The more logs there are in a short period of time, the more I see truncated. See the screenshots below.

Screen Shot 2020-06-18 at 1 58 55 PM Screen Shot 2020-06-18 at 2 02 43 PM

Describe what you expected: Logs output as expected, that is, one deserialized log record per row in the DataDog Log UI.

Steps to reproduce the issue:

  1. Start up Agent 6.17.1 container configured to collect all container logs.
  2. On the same host, run another container that generates a massive amount of JSON-encoded logs. Allow this to run for at least 2 minutes.
  3. Observe the truncated logs in your DataDog UI.

NOTE: I can create a sample Kafka Connect setup for you to test, if that will be helpful. I would just need to modify my existing setup to remove any employer-specific information.

Additional environment details (Operating System, Cloud provider, etc): Cloud Provider: AWS Cloud Service: ECS Agent version: 6.17.1 Agent containerized? Yes. Log output driver: json-file [non-blocking, 4m intermediate ring, 5 file rotation, 500kb per file] Log collection method: Agent [SSL encrypted TCP, uncompressed]

dmitrykarachentsev commented 3 years ago

I faced the same issue. Is there any update?

alexandru-iacob commented 2 years ago

This issue is still present.

sgt10 commented 2 years ago

We are having the same issue with: $ dotnet --version 6.0.100 and Serilog Can we get an update regarding this issue?

olivielpeau commented 2 years ago

Hi all, please make sure that:

If you're still having issues, please contact our support team and send them an Agent flare along with the raw logs that are affected (after removing any sensitive information from them).

benknight commented 2 years ago

I'm having this issue as well.

MattMyers3 commented 1 year ago

Also having this issue. Any updates?

ssrahul96 commented 1 year ago

facing same issue, where my log single json line log size is more than 400kB

This needs to go in as a config https://github.com/DataDog/datadog-agent/blob/main/pkg/logs/internal/decoder/decoder.go#L23

kostakostyukov1 commented 1 year ago

I'm having the same issue. is there a solution ?

sameeraksc commented 1 year ago

This is still happening. 3 years still no solution

souenzzo commented 1 year ago

Having the same issue.

All my json logs has less then 400 chars

Benbentwo commented 1 year ago

This needs to go in as a config https://github.com/DataDog/datadog-agent/blob/main/pkg/logs/internal/decoder/decoder.go#L23

@Datadog Team, why was this magic number chosen? Shouldn't this be a very simple PR (Which i'd happily open) to fix? What is the hold up on this?

sameeraksc commented 1 year ago

@DataDog Any update on this? :-(

koosha-t commented 11 months ago

same issue! Seems nobody cares!

annieTechno commented 9 months ago

Experiencing the same. How to fix it?

i524 commented 1 month ago

https://github.com/DataDog/datadog-agent/issues/5764#issuecomment-1595275364

This needs to go in as a config main/pkg/logs/internal/decoder/decoder.go#L23

https://github.com/DataDog Team, why was this magic number chosen? Shouldn't this be a very simple PR (Which i'd happily open) to fix? What is the hold up on this?

It seems t's been fixed here: https://github.com/DataDog/datadog-agent/pull/19055