DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.87k stars 1.21k forks source link

Segfault on ARM when using log collection #3667

Open cgorski-idexx opened 5 years ago

cgorski-idexx commented 5 years ago
ssm-user@raspberrypi:~/datadog-agent/bin $ sudo ./agent/agent run
2019-06-07 14:24:48 BST | CORE | INFO | (pkg/util/log/log.go:473 in func1) | config.Load()
2019-06-07 14:24:48 BST | CORE | INFO | (pkg/util/log/log.go:473 in func1) | config.load succeeded
2019-06-07 14:24:48 BST | CORE | INFO | (cmd/agent/app/start.go:123 in StartAgent) | Starting Datadog Agent v6.11.3+git.2.fa37351
2019-06-07 14:24:49 BST | CORE | INFO | (cmd/agent/app/start.go:151 in StartAgent) | Hostname is: raspberrypi
2019-06-07 14:24:54 BST | CORE | INFO | (cmd/agent/app/start.go:170 in StartAgent) | GUI server port -1 specified: not starting the GUI.
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/forwarder/forwarder.go:154 in Start) | Forwarder started, sending to 1 endpoint(s) with 1 worker(s) each: "https://6-11-3-app.agent.datadoghq.com" (1 api key(s))
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/dogstatsd/listeners/udp.go:77 in Listen) | dogstatsd-udp: starting to listen on 127.0.0.1:8125
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/logs/input/container/launcher.go:27 in NewLauncher) | Trying to initialize kubernetes launcher
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/logs/input/container/launcher.go:30 in NewLauncher) | Kubernetes launcher initialized
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/logs/logs.go:72 in Start) | Starting logs-agent...
2019-06-07 14:24:54 BST | CORE | ERROR | (pkg/logs/auditor/auditor.go:159 in recoverRegistry) | open /opt/datadog-agent/run/registry.json: no such file or directory
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/logs/logs.go:75 in Start) | logs-agent started
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/tagger/tagger.go:82 in Init) | starting the tagging system
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/runner/runner.go:95 in NewRunner) | Runner started with 4 workers.
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/autodiscovery/providers/file.go:70 in Collect) | file: searching for configuration files at: /etc/datadog-agent/conf.d
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/autodiscovery/providers/file.go:70 in Collect) | file: searching for configuration files at: /home/ssm-user/datadog-agent/bin/agent/dist/conf.d
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/autodiscovery/providers/file.go:70 in Collect) | file: searching for configuration files at:
2019-06-07 14:24:54 BST | CORE | WARN | (pkg/autodiscovery/providers/file.go:74 in Collect) | Skipping, open : no such file or directory
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/logs/scheduler/scheduler.go:52 in Schedule) | Received a new logs config: idexx-pi
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/logs/input/file/tailer.go:120 in setup) | Opening  /var/log/syslog
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler.go:63 in Schedule) | Scheduling check cpu
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler/scheduler.go:76 in Enter) | Scheduling check cpu with an interval of 15s
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler.go:63 in Schedule) | Scheduling check disk
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler/scheduler.go:76 in Enter) | Scheduling check disk with an interval of 15s
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler.go:63 in Schedule) | Scheduling check io
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler/scheduler.go:76 in Enter) | Scheduling check io with an interval of 15s
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler.go:63 in Schedule) | Scheduling check load
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler/scheduler.go:76 in Enter) | Scheduling check load with an interval of 15s
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler.go:63 in Schedule) | Scheduling check memory
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler/scheduler.go:76 in Enter) | Scheduling check memory with an interval of 15s
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler.go:63 in Schedule) | Scheduling check network
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler/scheduler.go:76 in Enter) | Scheduling check network with an interval of 15s
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler.go:63 in Schedule) | Scheduling check ntp
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler/scheduler.go:76 in Enter) | Scheduling check ntp with an interval of 15s
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler.go:63 in Schedule) | Scheduling check uptime
2019-06-07 14:24:54 BST | CORE | INFO | (pkg/collector/scheduler/scheduler.go:76 in Enter) | Scheduling check uptime with an interval of 15s
2019-06-07 14:24:55 BST | CORE | WARN | (pkg/logs/auditor/auditor.go:145 in func2) | open /opt/datadog-agent/run/registry.json: no such file or directory
2019-06-07 14:24:55 BST | CORE | INFO | (pkg/collector/runner/runner.go:264 in work) | Running check cpu
2019-06-07 14:24:55 BST | CORE | INFO | (pkg/collector/runner/runner.go:330 in work) | Done running check cpu
2019-06-07 14:24:56 BST | CORE | INFO | (pkg/serializer/serializer.go:274 in SendMetadata) | Sent host metadata payload, size (raw/compressed): 3411/1585 bytes.
2019-06-07 14:24:56 BST | CORE | INFO | (pkg/forwarder/transaction.go:193 in Process) | Successfully posted payload to "https://6-11-3-app.agent.datadoghq.com/intake/?api_key=*************************77357", the agent will only log transaction success every 20 transactions
2019-06-07 14:24:59 BST | CORE | INFO | (pkg/collector/runner/runner.go:264 in work) | Running check uptime
2019-06-07 14:24:59 BST | CORE | INFO | (pkg/collector/runner/runner.go:330 in work) | Done running check uptime
2019-06-07 14:25:00 BST | CORE | INFO | (pkg/collector/runner/runner.go:264 in work) | Running check network
2019-06-07 14:25:00 BST | CORE | INFO | (pkg/collector/runner/runner.go:330 in work) | Done running check network
2019-06-07 14:25:01 BST | CORE | INFO | (pkg/collector/runner/runner.go:264 in work) | Running check load
2019-06-07 14:25:01 BST | CORE | INFO | (pkg/collector/runner/runner.go:330 in work) | Done running check load
2019-06-07 14:25:02 BST | CORE | INFO | (pkg/logs/client/connection_manager.go:55 in func1) | Connecting to the backend: agent-intake.logs.datadoghq.com:10516, with SSL: true
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11a50]

goroutine 61 [running]:
runtime/internal/atomic.goXadd64(0x18224f4, 0x69, 0x0, 0x57a1e8, 0x1d1e501)
        /usr/local/go/src/runtime/internal/atomic/atomic_arm.go:96 +0x1c
github.com/DataDog/datadog-agent/pkg/logs/input/file.(*Tailer).incrementReadOffset(0x18224d0, 0x69)
        /home/ubuntu/golang/src/github.com/DataDog/datadog-agent/pkg/logs/input/file/tailer.go:236 +0x34
github.com/DataDog/datadog-agent/pkg/logs/input/file.(*Tailer).readForever(0x18224d0)
        /home/ubuntu/golang/src/github.com/DataDog/datadog-agent/pkg/logs/input/file/tailer.go:168 +0x17c
created by github.com/DataDog/datadog-agent/pkg/logs/input/file.(*Tailer).Start
        /home/ubuntu/golang/src/github.com/DataDog/datadog-agent/pkg/logs/input/file/tailer.go:105 +0x110

Attempting to run agent on raspberry pi with log collection fails.

uname -a from my environment: Linux raspberrypi 4.19.42-v7+ #1219 SMP Tue May 14 21:20:58 BST 2019 armv7l GNU/Linux

Log collection configuration file. Agent runs with no problems if this is not here:

logs:
  - type: file
    path: /var/log/syslog
    service: idexx-pi-syslog
    source: custom
cgorski-idexx commented 5 years ago

Follow up: I think arm7 is 32-bit.

readOffset in https://github.com/DataDog/datadog-agent/blob/master/pkg/logs/input/file/tailer.go is an int64. Could the atomic increment operation be causing the seg fault?

cgorski-idexx commented 5 years ago

May be applicable:

From:

https://godoc.org/sync/atomic

Bugs: On x86-32, the 64-bit functions use instructions unavailable before the Pentium MMX. On non-Linux ARM, the 64-bit functions use instructions unavailable before the ARMv6k core. On ARM, x86-32, and 32-bit MIPS, it is the caller's responsibility to arrange for 64-bit alignment of 64-bit words accessed atomically. The first word in a variable or in an allocated struct, array, or slice can be relied upon to be 64-bit aligned.

cgorski-idexx commented 5 years ago

Could be this: https://github.com/golang/go/issues/23345

cgorski-idexx commented 5 years ago

Fix here:

https://github.com/DataDog/datadog-agent/pull/3672

NBParis commented 5 years ago

Hello @cgorski-idexx ,

Thanks a lot for submitting this and doing the PR as well. We will definitely review it and make sure it is merged as soon as possible.

Thanks again for your contribution.