Open pawandhiman10 opened 1 week ago
@pawandhiman10 can you attach you config file for the OTEL plugin? @samugi can you help to have a look at this issue?
hello @pawandhiman10
errors in traces are captured and reported by the OpenTelemetry plugin as span events using the exception
event key and the exception.message
attribute as defined in the OpenTelemetry specification.
I believe Datadog might be expecting slightly different fields based on this, are you using the OTLP ingestion with the datadog agent? In that case, I would expect the ingestion process of OTLP data to take care of parsing any errors and translating any fields as needed.
I can confirm that errors are displayed correctly by other tools such as Jaeger and Grafana.
Also: could you share an example of an error that is being reported by your system and you are expecting to be visible in the UI?
@samugi Errors mean 5xx status code errors here. As per the screenshot earlier, it only displays the 2xx requests and not the 5xx. These are not actual errors but response is 5xx. This is missing. Below screenshot gives an idea that 5xx is occuring regularly and also increasing sometimes, we are not able to see these numbers in datadog while the same is visible in the load balancer logs.
Yes, currently the ingestion is based on OLTP ingestion with Datadog agent. Could you point me to some reference docs here on how to get the errors (5xx) parsed?
@pawandhiman10 in that case, it might be expected. 500 response status codes are not always reported as errors. Today errors are reported for failures (exceptions) that occur during the execution of plugin phase handlers, errors returned by the internal http client and dns failures.
Could you test with a 500 status code that originates from an exception in a plugin? This can be tested with a plugin that throws an explicit error like: error("fail")
from its access phase. The expected behavior is for this to generate a span with name kong.access.plugin.yourplugin
with error state and an event that contains the exception message. If Datadog is parsing things correctly, this should be displayed in the UI as well.
@samugi Seems like the error tag is missing with the 5xx status codes. Is there any way to add this tag with >=500 status code from the requests, span.error: true
?
Not sure on this, but is there a relation to any of these headers: w3c
, b3
, jaeger
, ot
, aws
, datadog
, gcp
?
Can see the error code there but coming under ok
status.
@pawandhiman10 what you describe is currently expected. Status codes are reported (whether they 2xx, 4xx, 5xx, etc), but spans are not being set the error state when this happens. Errors are currently reserved for actual errors occurring during the execution of plugins code (exceptions) and a few other scenarios that don't include specific response codes.
Would you be interested in setting an error state to your root span in case of 5xx status code from your upstream? If that is the case, such a change should probably be configurable, for example some might expect 4xx response codes from time to time, while others would consider them error conditions.
We are always interested in improving/updating our tracing instrumentation, but I cannot guarantee this change in particular will be made. I would recommend, to achieve this behavior, to write your own code (in a custom or serverless plugin) that applies this simple logic for your trace. You can use the tracing and the response PDK modules to achieve this.
@pawandhiman10 if you have access to the collector configuration, you can use the transformprocessor
to do that:
processors:
transform:
error_mode: ignore
trace_statements:
- context: span
statements:
- set (status.code, 2) where attributes["http.status_code"] >= 500
Here are the enums available, and the explanation why I've suggested set (status.code, 2)
:
https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl/contexts/ottlspan#enums
@julianocosta89 This is managed by Datadog agent currently without any collector config.
@samugi Could you also help me with setting up OTEL_TRACES_SAMPLER=always_on
with this plugin?
Setting it as environment variable did not work as expected. Opentelemetry
As of the error(5xx) reporting in Datadog, will experiment with the code. Thanks for that.
Is there an existing issue for this?
Kong version (
$ kong version
)3.7.1
Current Behavior
We have installed this plugin for traces in Kong. Plugin link
We are sending traces to Datadog with this. But the traces are only for requests and errors are not captured at all.
Expected Behavior
Errors should be captured and reported in the UI.
Steps To Reproduce
Install this plugin using Helm. We have attached this plugin to a service with the kong annotations:
konghq.com/plugins: opentelemetry
Configuration reference for the kongplugin on the plugin definition page. This is connected to otel configuration exposed by Datadog agent on http port 4318.Anything else?
No response