Closed mblasi closed 6 years ago
Here is the agent installation log:
Step 18/28 : RUN sh -c "echo 'deb https://apt.datadoghq.com/ stable 6' > /etc/apt/sources.list.d/datadog.list"
---> Running in 939e75ec4a5f
Removing intermediate container 939e75ec4a5f
---> c671baafc875
Step 19/28 : RUN apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 382E94DE
---> Running in 3723eaf21433
Warning: apt-key output should not be parsed (stdout is not a terminal)
Executing: /tmp/apt-key-gpghome.uFtL7Vggj3/gpg.1.sh --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 382E94DE
gpg: key D3A80E30382E94DE: public key "Datadog, Inc <package@datadoghq.com>" imported
gpg: Total number processed: 1
gpg: imported: 1
Removing intermediate container 3723eaf21433
---> 151fd230c251
Step 20/28 : ADD datadog.yaml /etc/datadog-agent/datadog.yaml
---> 5a932e2f4ace
Step 21/28 : RUN apt-get update
---> Running in ed731c33645f
Hit:1 http://security.debian.org stretch/updates InRelease
Ign:2 http://cdn-fastly.deb.debian.org/debian stretch InRelease
Ign:3 https://apt.datadoghq.com stable InRelease
Get:4 https://apt.datadoghq.com stable Release [4525 B]
Get:5 https://apt.datadoghq.com stable Release.gpg [819 B]
Hit:6 http://cdn-fastly.deb.debian.org/debian stretch-updates InRelease
Get:7 https://apt.datadoghq.com stable/6 amd64 Packages [4015 B]
Hit:8 http://cdn-fastly.deb.debian.org/debian stretch Release
Fetched 9359 B in 0s (16.0 kB/s)
Reading package lists...
Removing intermediate container ed731c33645f
---> cb99c33b7e51
Step 22/28 : RUN apt-get install datadog-agent
---> Running in 57bc77d041f5
Reading package lists...
Building dependency tree...
Reading state information...
The following NEW packages will be installed:
datadog-agent
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 99.7 MB of archives.
After this operation, 341 MB of additional disk space will be used.
Get:1 https://apt.datadoghq.com stable/6 amd64 datadog-agent amd64 1:6.4.2-1 [99.7 MB]
debconf: delaying package configuration, since apt-utils is not installed
Fetched 99.7 MB in 1s (63.0 MB/s)
Selecting previously unselected package datadog-agent.
(Reading database ... 7128 files and directories currently installed.)
Preparing to unpack .../datadog-agent_1%3a6.4.2-1_amd64.deb ...
Unpacking datadog-agent (1:6.4.2-1) ...
Setting up datadog-agent (1:6.4.2-1) ...
Creating dd-agent group
Creating dd-agent user
Enabling service datadog-agent
(Re)starting datadog-agent now...
Removing intermediate container 57bc77d041f5
---> 2ae3a59e8736
Hi @mblasi
Do you start the agent when your container starts? Try running the same status command after datadog-agent run
, it should work.
Hi @hkaj ,
I tried lots of ways... every one with a different issue.
Now I'm trying with building the client from the sources, and I get system metrics reported (mem, cpu, net) but traces are still not handled (same error reported here), looks like apm.enabled is not true, but it is!
I think the biggest help from you could be: which way should be the recommended for my scenario: GAE Flex environment??? (From datadog support answered me that it is not supported up to now, but I think I should be able to start the agent with a custom runtime).
The 4 ways I'm trying are:
1 - The official one-step installation (https://app.datadoghq.com/account/settings#agent/debian). It looks like systemd needed, and the gae debian docker image doesn't have. 2 - The official manual installation (https://app.datadoghq.com/account/settings#agent/debian). Service starts, service metrics reported, but no traces handled. 3 - Building the agent from the sources (following https://github.com/DataDog/datadog-agent), client looks like started but nothing reported. 4 - Run the agent in a different docker container (https://docs.datadoghq.com/tracing/setup/docker/) I think I should create a new service of my GAE and configure the docker network in both services, and the traces hostname in my app service... not tried yet.
The most working way looks like the 2nd, but the traces are not handled:
2018/08/19 00:01:25 errors.go:72: Datadog Exporter error: Post http://localhost:8126/v0.3/traces: dial tcp [::1]:8126: connect: connection refused (x2)
My datadog.yaml has the apm.enabled=true... I don't know why the agent is not handling the /traces resource... could it be related to any docker networking internal??
Before continue loosing time between this ways, let me know which one should be for my scenario.
Regards, Matías.
@mblasi support is right, GAE Flex is not supported as of today.
If you want to give it a shot, the solution you started with in this issue (# 2 I believe?) is probably your best option. Solution 4 would work as well if you setup networking between app containers and the agent container, but traces, logs, and metrics will have the wrong hostname attached (probably the datadog-agent container ID? I'm not familiar enough with GAE to tell for sure).
I think what was missing in your first message was that the agent was not running. Try running the command I sent earlier, that should start the infra agent. And /opt/datadog-agent/embedded/bin/trace-agent -config /etc/datadog-agent/datadog.yaml
should run the trace agent.
Great. I'll try.
Just for clarifying, the trace agent isn't embbeded in the datadog-agent??? I read that in the datadog-trace-agent project. As far as I understand, if my datadog.yaml has the apm.enabled = true, it should "start" de trace agent, isn't it?
Hi, here is an update:
After running /opt/datadog-agent/embedded/bin/trace-agent -config /etc/datadog-agent/datadog.yaml the traces are now handled! :)
Just two doubts:
1 - Should I start it by hand? Isn't it embbedded in datadog-agent??? (https://github.com/DataDog/datadog-trace-agent here says it is included in datadog-agent) 2 - It is still missing this two reports info: https://app.datadoghq.com/process?columns=host,process,user,cpu,memory,start&options=normalizeCPU,showArguments&sort=memory,DESC# and https://app.datadoghq.com/containers?columns=container_name,container_cpu,container_memory,container_net_sent_bps,container_net_rcvd_bps,container_status,container_started&options=normalizeCPU&sort=container_memory,DESC# What am I missing?
Here is the datadog-agent status ouput:
matias@aef-default-20180820t161712-jcgz:~$ docker exec gaeapp datadog-agent status
Getting the status from the agent.
==============
Agent (v6.4.2)
==============
Status date: 2018-08-20 19:53:21.197366 UTC
Pid: 7
Python Version:
Logs:
Check Runners: 1
Log Level: info
Paths
=====
Config File: /etc/datadog-agent/datadog.yaml
conf.d: /etc/datadog-agent/conf.d
checks.d: /etc/datadog-agent/checks.d
Clocks
======
NTP offset: -5.1117e-05 s
System UTC time: 2018-08-20 19:53:21.197366 UTC
Host Info
=========
bootTime: 2018-08-20 19:25:24.000000 UTC
kernelVersion: 4.9.0-7-amd64
os: linux
platform: debian
platformFamily: debian
platformVersion: 9.5
procs: 60
uptime: 48
virtualizationRole: guest
virtualizationSystem: docker
Hostnames
=========
host_aliases: [aef-default-20180820t161712-jcgz.weshipit-today]
hostname: aef-default-20180820t161712-jcgz.c.weshipit-today.internal
socket-fqdn: a7f5c51f9c88
socket-hostname: a7f5c51f9c88
=========
Collector
=========
Running Checks
==============
cpu
---
Total Runs: 108
Metric Samples: 6, Total: 642
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0ms
disk (1.2.0)
------------
Total Runs: 108
Metric Samples: 98, Total: 10584
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 31ms
file_handle
-----------
Total Runs: 108
Metric Samples: 1, Total: 108
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0ms
io
--
Total Runs: 108
Metric Samples: 26, Total: 2790
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0ms
load
----
Total Runs: 108
Metric Samples: 6, Total: 648
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0ms
memory
------
Total Runs: 108
Metric Samples: 17, Total: 1836
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0ms
network (1.6.0)
---------------
Total Runs: 108
Metric Samples: 20, Total: 2160
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0ms
ntp
---
Total Runs: 108
Metric Samples: 1, Total: 108
Events: 0, Total: 0
Service Checks: 1, Total: 108
Average Execution Time : 0ms
uptime
------
Total Runs: 108
Metric Samples: 1, Total: 108
Events: 0, Total: 0
Service Checks: 0, Total: 0
Average Execution Time : 0ms
========
JMXFetch
========
Initialized checks
==================
no checks
Failed checks
=============
no checks
=========
Forwarder
=========
CheckRunsV1: 108
Dropped: 0
DroppedOnInput: 0
Errors: 0
Events: 0
HostMetadata: 0
IntakeV1: 9
Metadata: 0
Requeued: 0
Retried: 0
RetryQueueSize: 0
Series: 0
ServiceChecks: 0
SketchSeries: 0
Success: 225
TimeseriesV1: 108
API Keys status
===============
https://6-4-2-app.agent.datadoghq.com,*************************4b2f4: API Key valid
==========
Logs Agent
==========
Logs Agent is not running
=========
DogStatsD
=========
Checks Metric Sample: 20715
Event: 1
Events Flushed: 1
Number Of Flushes: 108
Series Flushed: 12665
Service Check: 974
Dogstatsd Metric Sample: 793
Ok, these views are working now, it is necessary to start /opt/datadog-agent/embedded/bin/process-agent by hand also.
Is there any other process to start? datadog-agent, trace-agent, process-agent, is there any list of 'agents' documented? I didn't find it.
Regards, Matías.
1 - Should I start it by hand? Isn't it embbedded in datadog-agent??? (https://github.com/DataDog/datadog-trace-agent here says it is included in datadog-agent)
Documentation is not explicit on this, but it's included in the standard packaging of the Datadog Agent. The image we provide takes care of running it if you enable it in the config file, as well as the process agent. Since you're installing the agent manually in your container things are a bit more manual in this case. Note that the logs agent is included in the datadog-agent binary itself so it is started by default. There is no other agent you need to start.
Seems like your issue is solved, closing it but feel free to comment or reach out to support if you need more help.
Thanks
Thank you @hkaj
I've hit this myself, just now..
So the "fix" is to manually start parts of the agent?
I've met same issue. I manually install datadog-agent in Dockerfile, when i ssh into running container, check datadog agent status, it gave me this error message
Here is the datadog-agent status ouput:
==========
Logs Agent
==========
Logs Agent is not running
=========
APM Agent
=========
Status: Not running or unreachable on localhost:8126.
Error: Get "http://localhost:8126/debug/vars": dial tcp 127.0.0.1:8126: connect: connection refused
After running this command, apm agent is started and running:
/opt/datadog-agent/embedded/bin/trace-agent -config /etc/datadog-agent/datadog.yaml >/dev/null 2>&1 &
==========
Logs Agent
==========
Logs Agent is not running
=========
APM Agent
=========
Status: Running
Pid: 1748
Uptime: 3 seconds
Mem alloc: 7,667,200 bytes
Hostname: 8667609f1847
Receiver: localhost:8126
Endpoints:
https://trace.agent.datadoghq.com
Receiver (previous minute)
===================================
Aggregator
=========
Checks Metric Sample: 12,570
Dogstatsd Metric Sample: 839
Event: 1
Events Flushed: 1
Number Of Flushes: 32
Series Flushed: 10,110
Service Check: 325
Service Checks Flushed: 348
=========
DogStatsD
=========
Event Packets: 0
Event Parse Errors: 0
Metric Packets: 838
Metric Parse Errors: 0
Service Check Packets: 62
Service Check Parse Errors: 0
Udp Bytes: 157,432
Udp Packet Reading Errors: 0
Udp Packets: 338
Uds Bytes: 0
Uds Origin Detection Errors: 0
Uds Packet Reading Errors: 0
Uds Packets: 0
Unterminated Metric Errors: 0
you need to start these service manually.
Describe what happened:
I'm trying to make agent traces work on app engine flex environment. My instances run debian 9. I get the agent it installed using the manual steps:
The agent starts, but it looks like no traces are handled, I see this in my application log:
2018/08/19 00:01:25 errors.go:72: Datadog Exporter error: Post http://localhost:8126/v0.3/traces: dial tcp [::1]:8126: connect: connection refused (x2)
I tried to diagnose the status by adding a last line in the installation process (Dockerfile):
But it fails:
Error: Get https://localhost:5001/agent/status: dial tcp 127.0.0.1:5001: connect: connection refused
Describe what you expected:
I expect to have the agent and traces running on my app instances.
Steps to reproduce the issue:
Additional environment details (Operating System, Cloud provider, etc):
OS: Debian 9 Cloud: Google cloud app engine flexible Runtime: custom Language: golang
My datadog.yaml enables the traces:
My Dockerfile exports every port I think should: