Open samholder opened 2 years ago
After a bit more investigation if I enable debug logging then I get this in the logs:
2022-04-14 12:19:37 BST | TRACE | DEBUG | (pkg/trace/api/otlp.go:97 in Start) | OpenTelemetry gRPC receiver running on localhost:5003 (internal use only)
and if I configure things to point at this endpoint then I get some traces being logged, but they appear to get tagged as internal, so I assume this is not the right thing to be doing...
On the same boat here. Can't get this to work. Seems like I can't access these ports (4318, 4317). However, I can access 8126 just fine.
Same here.
I can see that my traces are correctly exported from my app to the agent:
I have a "debug trace" sent every 2s.
Because the OpenTelemtry OtlpGrpcSpanExporter
class I'm using is using OkHttp underneath and because the DD APM automatically instruments OkHttp, I can see the traces of the exports in the Datadog UI.
These traces are tracing the HTTP calls made by the internal of OtlpGrpcSpanExporter
and I can see that these calls response are 200
.
But then, I cannot see anything in the Agent logs nor in the Datadog UI.
Datadog Agent v7.35.0
Datadog APM v0.99.0
OpenTelemetry Java lib (io.opentelemetry.opentelemetry-exporter-otlp
) v1.13.0
Here are the logs of my agent regarding OLTP:
"2022-04-14T11:39:06.000Z","2022-04-14 11:39:06 UTC | CORE | INFO | (pkg/util/log/log.go:572 in func1) | runtime: final GOMAXPROCS value is: 4"
"2022-04-14T11:39:06.000Z","2022-04-14 11:39:06 UTC | CORE | WARN | (pkg/util/log/log.go:587 in func1) | OTLP ingest configuration is now stable and has been moved out of the ""experimental"" section. This section will be removed in the 7.37 Datadog Agent release. Please use the ""otlp_config"" section instead.The DD_OTLP_GRPC_PORT and DD_OTLP_HTTP_PORT environment variables will also be removed in 7.37; set the full endpoint instead."
"2022-04-14T11:39:06.000Z","2022-04-14 11:39:06 UTC | CORE | WARN | (pkg/util/log/log.go:592 in func1) | failed to get configuration value for key ""experimental.otlp"": unable to cast <nil> of type <nil> to map[string]interface{}"
"2022-04-14T11:39:06.000Z","2022-04-14 11:39:06 UTC | CORE | INFO | (pkg/util/log/log.go:572 in func1) | Features detected from environment: containerd,kubernetes,docker,cri"
"2022-04-14T11:39:06.000Z","2022-04-14 11:39:06 UTC | CORE | INFO | (cmd/agent/app/run.go:249 in StartAgent) | Starting Datadog Agent v7.35.0"
...
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/service/internal/builder/exporters_builder.go:255 in buildExporter) | kind:exporter,name:otlp | Exporter was built."
...
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/service/internal/builder/receivers_builder.go:226 in attachReceiverToPipelines) | kind:receiver,name:otlp,datatype:traces | Receiver was built."
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/service/internal/builder/receivers_builder.go:226 in attachReceiverToPipelines) | kind:receiver,name:otlp,datatype:metrics | Receiver was built."
...
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/service/internal/builder/exporters_builder.go:48 in Start) | kind:exporter,name:otlp | Exporter started."
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/service/internal/builder/exporters_builder.go:40 in Start) | kind:exporter,name:otlp | Exporter is starting..."
...
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/service/internal/builder/receivers_builder.go:73 in StartAll) | kind:receiver,name:otlp | Receiver started."
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/receiver/otlpreceiver/otlp.go:87 in startHTTPServer) | kind:receiver,name:otlp | Starting HTTP server on endpoint 0.0.0.0:55681"
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/receiver/otlpreceiver/otlp.go:147 in startProtocolServers) | kind:receiver,name:otlp | Setting up a second HTTP listener on legacy endpoint 0.0.0.0:55681"
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/receiver/otlpreceiver/otlp.go:87 in startHTTPServer) | kind:receiver,name:otlp | Starting HTTP server on endpoint 0.0.0.0:4318"
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/receiver/otlpreceiver/otlp.go:69 in startGRPCServer) | kind:receiver,name:otlp | Starting GRPC server on endpoint 0.0.0.0:4317"
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | WARN | ([zap@v1.20.0](mailto:zap@v1.20.0)/sugar.go:107 in Warn) | grpc_log:true | grpc: addrConn.createTransport failed to connect to {localhost:5003 <nil> 0 <nil>}. Err: connection error: desc = ""transport: Error while dialing dial tcp 127.0.0.1:5003: connect: connection refused"". Reconnecting..."
"2022-04-14T11:39:07.000Z","2022-04-14 11:39:07 UTC | CORE | INFO | ([collector@v0.44.0](mailto:collector@v0.44.0)/service/internal/builder/receivers_builder.go:68 in StartAll) | kind:receiver,name:otlp | Receiver is starting..."
...
Hi, I am a Product Manager at Datadog. We recently declared our OTLP Ingest in Datadog Agent (sending telemetry data from OTel SDK to DD Agent) as stable/GA, and is available with agent version 7.35. A very clear and descriptive documentation for this feature will be available in our public docs in the mid-next week. This should solve the setup problems you are facing but in case further support is needed, I will also be happy to connect you to our engineering team. Thank you for choosing Datadog as your APM vendor of choice!
For anyone reading this issue, I figured my issue out.
The traces were correctly sent to Datadog. My configuration (explained here: https://github.com/DataDog/helm-charts/issues/529#issuecomment-1099421478) works perfectly well.
I wasn't finding my traces because I was looking for them in the env: staging
in the APM UI of Datadog (where I can see all the other traces of my app), while they were reported in the env: none
.
To fix this env
issue, I configured the ResourceAttributes.DEPLOYMENT_ENVIRONMENT
in my SdkTracerProvider
instance:
val serviceName = "my-service"
val env = "staging"
val resource =
Resource
.builder()
.put(ResourceAttributes.SERVICE_NAME, serviceName)
.put(ResourceAttributes.DEPLOYMENT_ENVIRONMENT, environment)
.build()
...
val tracerProvider =
SdkTracerProvider
.builder()
.addSpanProcessor(spanProcessor)
.setResource(resource)
.build()
Hi, I am a Product Manager at Datadog. We recently declared our OTLP Ingest in Datadog Agent (sending telemetry data from OTel SDK to DD Agent) as stable/GA, and is available with agent version 7.35. A very clear and descriptive documentation for this feature will be available in our public docs in the mid-next week. This should solve the setup problems you are facing but in case further support is needed, I will also be happy to connect you to our engineering team. Thank you for choosing Datadog as your APM vendor of choice!
What is the difference between Datadog's otel agent and standard otel collector? Currently we are using otel-collector where the exporters are set to datadog. Will the release influences our current use?
hi folks, I've spent a good amount of time working on this topic recently and I'd like to share what I've found out here.
TL;DR; version of the issue: agent
listens on 4317
/4318
(otlp
receiver) correctly but the traces were not sent to the trace-agent
(on internal port 5003
)
my setup:
7.35.0
. otlp
is enabled via environment variable: (tested with datadog.yaml
as well, same result)DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_ENDPOINT=0.0.0.0:4317
DD_OTLP_CONFIG_RECEIVER_PROTOCOLS_GRPC_TRANSPORT: tcp
4317
with netstat -apln | grep 4317
opentelemetry-python
auto-instrumentationbefore attempting using OTLP ingestion on datadog-agent
, my application was auto-instrumented with opentelemetry-exporter-datadog
(being deprecated): I was able to see APM traces on my datadog account
when I replaced opentelemetry-exporter-datadog
with opentelemetry-exporter-otlp
(pointing to the 4317
port), I lost APM traces on my datadog account. same API key was used compared to before and the rest of application code remains the same
some details for opentelemetry-exporter-otlp
implementation:
OTEL_RESOURCE_ATTRIBUTES
is set for my application:
OTEL_RESOURCE_ATTRIBUTES="service.name=my_app,deployment.environment=my_env,service.version=my_version"
opentelemetry-exporter-datadog
:
service.name
matches the value for DD_SERVICE
deployment.environment
matches the value for DD_ENV
service.version
matches the value for DD_VERSION
otlp_exporter = OTLPSpanExporter(endpoint="http://datadog:4317", insecure=True)
(datadog
is the hostname in the docker
network)opentelemetry-exporter-datadog
):
trace.get_tracer_provider().add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
what I observed:
tcpdump
inside datadog agent container: tcpdump -nnA -i any "port 4317" -s 0
: I was able to see the correct data in plain text coming to the 4317
portagent
is connected to trace-agent
on port 5003
(internal grpc server for agent
to export traces to trace-agent
receiver)tcpdump
inside datadog agent container: tcpdump -nnA -i any "port 5003" -s 0
: I don't see the telemetry data.since the agent
ingest otlp
and export to trace-agent
in otlp
, I tried pointing my application otlp exporter to port 5003 instead and this failed too.
I've set DD_LOG_LEVEL
to trace
and inspected the /var/log/datadog/agent.log
and /var/log/datadog/trace-agent.log
but nothing looks suspicious. the only thing related is a "connection refused" error message when agent
is trying to connect to trace-agent
before trace-agent
starts listening on the internal grpc port.
would be great to have a working example published from DataDog to help us understand what's missing for trace
@iamharvey Today there are two methods a customer can use to send their telemetry data to Datadog.
Method 1: OTLP Ingest in Datadog Agent - A way to send telemetry data from OTel SDKs directly to Datadog Agent,
Method 2: OTel Collector Datadog Exporter - A way to send telemetry data from OTel SDKs to OTel Collector, which exports the data to Datadog Backend via a Datadog Exporter.
If you are using OTel Collector Datadog Exporter method, the release (GAing OTLP Ingest) will not influence your use.
NOTE: I am happy to announce that OTLP Ingest in Datadog Agent is now GA/Stable with Datadog Agent version 7.35
I am happy to announce that OTLP Ingest in Datadog Agent is now GA/Stable with Datadog Agent version 7.35
Not officially available yet in the Helm Chart ;)
@pj-datadog I love the idea of method 1 to make it easier to adopt. I've tested this setup but ran into an issue with traces (reported in #11737 ) and it seems like I'm not alone.
I have a gut feeling that this is caused by misconfiguration instead of a bug in datadog-agent
. would be great to have more examples / documentation to refer to
@duxing Can you please reach out to Datadog Support and open a zendesk ticket there. I asked my engineering team to look at your use-case, we feel we will need a debug flare.
would be great to have more examples / documentation to refer to
What kind of documentation do you have in mind? I can work with you to have that in our public documentation if you think that will be helpful for the larger community. Thank You!
@pj-datadog a support ticket has been open since a few days ago: https://help.datadoghq.com/hc/en-us/requests/789265?page=1
details are cross-referenced
we feel we will need a debug flare I don't think you need that from me. see README from the repo I linked in the referred issue #11737 once you
git clone
the repo, you should be able to get what you need.What kind of documentation do you have in mind? mainly examples I guess. the only example I found was the gist that @gbbr provides (for golang). more examples supporting more languages would be really nice to have.
@guizmaii currently you can set this manually via the environment variables in helm (https://docs.datadoghq.com/tracing/setup_overview/open_standards/otlp_ingest_in_the_agent/?tab=kuberneteshelm), but we are actively working on a dedicated configuration section to make this easier
As I said over at #11737, thank you @duxing for the detailed comments and repro, it was really helpful. @duxing's example with the patch from duxing/datadog-otlp#1 is a working example of how to use OTLP ingest with traces on a containerized setting.
I tried to use method2 with the section of 'Alongside Datadog Agent' and it didn't work. Tried to deploy opentelemetry as Damonset alongside Datadog agent Damonset but wasn't able to succeed since both Damonset try to listen on the same port of the hosts(4317,4318). Also tried to deploy opentelemtry as deployment and to use the otlp exporter:
exporters: otlp: endpoint: "${HOST_IP}:4317"
But got tls errors. Did anyone was able to configure it and send traces?
Note that you don't need to deploy the OpenTelemetry Collector to use OpenTelemetry. You can just use the Datadog Agent if you want and point your application to send telemetry data to the Agent.
Also tried to deploy opentelemtry as deployment and to use the otlp exporter: But got tls errors.
If you still want to use the Collector and the Agent, you can disable TLS by doing this:
exporters:
otlp:
endpoint: "${HOST_IP}:4317"
tls:
insecure: true
This should be safe to do if communication happens locally.
Note that you don't need to deploy the OpenTelemetry Collector to use OpenTelemetry. You can just use the Datadog Agent if you want and point your application to send telemetry data to the Agent.
Also tried to deploy opentelemtry as deployment and to use the otlp exporter: But got tls errors.
If you still want to use the Collector and the Agent, you can disable TLS by doing this:
exporters: otlp: endpoint: "${HOST_IP}:4317" tls: insecure: true
This should be safe to do if communication happens locally.
tls.insecure: true in opentelemetry helm chart? Also, if I don't want to disable the TLS what should I do?
adding the error we get: }. Err: connection error: desc = "transport: authentication handshake failed: tls: first record does not look like a TLS handshake" {"grpc_log": true}
Can the agent also ingest simple metrics (i.e., counters)?
@duxing did you deploy the DD agent via helm?
I am currently facing the same issue, but only after changing over to the Datadog Operator. My previous setup was, DD agent via helm which received otlp from the OTEL collector. All I did was change the DD agent to be deployed via the DatadogAgent resource from the operator and now the agent no longer forwards traces that it receives from the collector.
I've "fixed" the issue by just exporting to DD directly by using the DD exporter inside the OTEL collector.
@TimoSchmechel
not for my set up.
I'm running my setup locally with docker-compose
and deploying the same change through helm
to staging/production environment would have been the next step.
I think the issue is in the agent image/binary itself, not the chart. however, there's not a good way to reproduce this issue. the test project (setup with docker
) I setup consistently surface this issue on my end, but did not yield any error on datadog side.
In my case the issue was the trace-agent (component that listens on 5003/tcp) is disabled by default when using the Datadog operator. Traces were making it to the external otlp endpoints (4317/4318) on the agent, then getting blackholed at the trace-agent port (5003).
Setting features.apm.enabled
to true
in my DatadogAgent
manifest turned on the trace-agent container and fixed my issue.
@pj-datadog this may be helpful to document here for those of us that didn't already have APM enabled: https://docs.datadoghq.com/opentelemetry/otlp_ingest_in_the_agent/?tab=host
imo it doesn't really make sense that you can turn on the otlp receiver without also turning on the trace-agent.
Has this issue been resolved? I followed your example for helm chart but it doesn't work
datadog:
otlp:
receiver:
protocols:
http:
enabled: true
endpoint: "0.0.0.0:4318"
useHostPort: true
I can see only this log
UTC | TRACE | DEBUG | (pkg/trace/api/otlp.go:91 in Start) | Listening to core Agent for OTLP traces on internal gRPC port (http://0.0.0.0:5003, internal use only). Check core Agent logs for information on the OTLP ingest status.
Looking at the pod manifest, it looks like you have the port specification and variables set up well.
Output of the info page (if this is a bug)
Describe what happened:
This is running on Windows machine with agent installed (7.32.4.1) which reports in the logs
and our application is sending traces via APM currently.
I am trying to add support for open telemetry alongside the current APM stuff.
I followed the instructions here: https://docs.datadoghq.com/tracing/setup_overview/open_standards/#otlp-ingest-in-datadog-agent and added this config to the datadog.yml:
When I restarted the agent I saw this in the logs:
I searched little and found this issue: https://github.com/DataDog/helm-charts/issues/529
which seems to imply this this feature is no longer experimental, however I was not able to get this to function. Things I tried:
using this config in the agent datadog.yml:
but same error as above basically.
Setting the following environment variables (based on other issue)
OTEL_EXPORTER_OTLP_ENDPOINT
tohttp://localhost:4317
DD_OTLP_HTTP_PORT
to 4317DD_OTLP_GRPC_PORT
to 4318OTLP_COLLECTOR
tohttp://localhost:4317
but after restarting the agent I see no other messages apart from the:
which implies this did not work. I also tried the app just in case.
The app is a .net 6 web app with this configration:
Describe what you expected:
That following the instructions for enabling OTLP ingest would work correctly, or some alternative instructions are available if the feature is not experimental any more.
Steps to reproduce the issue: As above
Additional environment details (Operating System, Cloud provider, etc):
locally hosted Windows Server 2019VM Datadog Agent 7.32.4.1 Datadog .NET tracer 64 bit 1.27.1