Open axw opened 2 years ago
Is this still the case? I tried this with a small snippet on v8.5.1 and all traces are recorded as transactions, not traces.
Hi @ymotongpoo! Yes, this is still the case.
I tried this with a small snippet on v8.5.1 and all traces are recorded as transactions, not traces.
Can you share the snippet? I can't tell from the screenshot which spans should or should not be considered transactions.
The code relevant to this bug is: https://github.com/elastic/apm-server/blob/efc8be5b0b07b17cfc7dcfdee44dd248f86e01e4/internal/processor/otel/traces.go#L196-L225
If/when the linked OTEP is accepted and implemented, then instead of checking if root || otelSpan.Kind() == ptrace.SpanKindServer || otelSpan.Kind() == ptrace.SpanKindConsumer {
we could check if root || otelSpan.HasRemoteParent() {
. Only in that case would we translate to an Elastic APM transaction.
Hi @axw, thanks for the response! Here's the code snippet I used for the demo.
func handler(w http.ResponseWriter, r *http.Request) {
q := r.URL.Query()
c := q.Get("c")
n, err := strconv.Atoi(c)
if err != nil {
n = 2
}
if n > 8 {
n = 8
}
ctx, span := otel.Tracer("handler").Start(r.Context(), "sample.handler.call.foo")
foo(ctx, n)
span.End()
w.WriteHeader(http.StatusOK)
fmt.Fprintf(w, "requested: %v, executed: %v", c, n)
}
func foo(ctx context.Context, n int) {
if n == 0 {
return
}
ctx, span := otel.Tracer("handler").Start(ctx, "sample.foo")
span.SetAttributes(attribute.Key("iteration").Int(n))
time.Sleep(interval)
span.AddEvent(fmt.Sprintf("mid-interval-%v", n))
time.Sleep(interval)
foo(ctx, n-1)
span.End()
}
I didn't used OpenTelemetry Collector and directly sent the traces with OTLP exporter from the app.
// OTLP exporter config for Elastic Observability
otlpexp, err := otlptracegrpc.New(
context.Background(),
otlptracegrpc.WithEndpoint(OTLPEndpoint),
otlptracegrpc.WithHeaders(OTLPHeaders),
)
if err != nil {
return nil, err
}
Because this is a standalone demo running on Cloud Run and doesn't have other depending services, so the traces should be either of:
I see, thanks. This is not exactly the same issue as in the description, but would also be solved by the change I described.
For this case you have a workaround: set the sample.handler.call.foo
span kind to SpanKindServer. IIANM, it will otherwise be SpanKindInternal.
Thank you for the workaround, @axw. However, neither of SpanKind worked and still all traces are recognized as transactions.
Just in case, I changed the SpanKind of sample.foo
as well, but it didn't work. Here's the current code that is instrumented with the workaround.
func handler(w http.ResponseWriter, r *http.Request) {
q := r.URL.Query()
c := q.Get("c")
n, err := strconv.Atoi(c)
if err != nil {
n = 2
}
if n > 8 {
n = 8
}
ctx, span := otel.Tracer("handler").Start(
r.Context(),
"sample.handler.call.foo",
trace.WithSpanKind(trace.SpanKindInternal),
)
foo(ctx, n)
span.End()
w.WriteHeader(http.StatusOK)
fmt.Fprintf(w, "requested: %v, executed: %v", c, n)
}
func foo(ctx context.Context, n int) {
if n == 0 {
return
}
ctx, span := otel.Tracer("handler").Start(
ctx,
"sample.foo",
trace.WithSpanKind(trace.SpanKindInternal),
)
span.SetAttributes(attribute.Key("iteration").Int(n))
time.Sleep(interval)
span.AddEvent(fmt.Sprintf("mid-interval-%v", n))
time.Sleep(interval)
foo(ctx, n-1)
span.End()
}
@ymotongpoo sorry, I missed something in your original comment and went off track a bit. I think what you're seeing here is related to Cloud Run injecting a traceparent
header. First of all I'll try to clarify some things, and correct some comments I made above.
Transactions are a special kind of span that have additional attributes associated with them. They describe an event captured by an Elastic APM agent instrumenting a service. You can think of transactions as the highest level of work you’re measuring within a service. As an example, a transaction might be a:
- Request to your server
- ...
I guess you already know this, but just so we're on the same page:
https://opentelemetry.io/docs/reference/specification/trace/api/#spankind says:
SERVER Indicates that the span covers server-side handling of a synchronous RPC or other remote request. This span is often the child of a remote CLIENT span that was expected to wait for a response. ... INTERNAL Default value. Indicates that the span represents an internal operation within an application, as opposed to an operations with remote parents or children.
I think I misunderstood your trace before, and thought that sample.handler.call.foo
was meant to represent the incoming HTTP request handler. After looking at your screenshots again, I suppose that you are also using otlphttp
to wrap your handler, and that is the sample.handler
transaction? In that case, sample.handler.call.foo
and sample.foo
should have the internal span kind, and you don't need to explicitly use WithSpanKind
-- sorry about that.
Going back to your initial comment, I overlooked something:
I tried this with a small snippet on v8.5.1 and all traces are recorded as transactions, not traces.
The Traces page is simply listing root transactions, by looking for any transactions without a parent ID. In your example we should expect that sample.handler
is listed on the Traces page. Given that it doesn't show up, that implies sample.handler
does have a parent ID. You could check that by clicking the sample.handler
row and looking for a parent.id
field under the metadata.
Assuming that's the case, I think you're hitting something similar to this issue: https://github.com/elastic/apm/issues/609. i.e. I expect Cloud Run is injecting a traceparent
header, and your instrumentation is continuing the trace with that trace context. It therefore doesn't think the server transaction is the root, and doesn't show up under Traces. You could confirm this by passing trace.WithNewRoot
to the otlphttp instrumentation.
@axw Thank you for the detailed answer.
i.e. I expect Cloud Run is injecting a traceparent header, and your instrumentation is continuing the trace with that trace context.
I realized I missed the fact that Cloud Run injects the traceparent
automatically.
You could confirm this by passing trace.WithNewRoot to the otlphttp instrumentation.
As you suggested, I tried creating span with trace.WithNewRoot()
and now I see sample.handler
in the list. (I also tried using trace.WithNewRoot()
from sample.handler.call.foo
and it was recognized in the trace list as well)
The snippet was:
otelHandler := otelhttp.NewHandler(
http.HandlerFunc(handler),
"sample.handler",
otelhttp.WithSpanOptions(trace.WithNewRoot()),
)
Given above, as you pointed out, my case seems about https://github.com/elastic/apm/issues/609.
APM Server converts OpenTelemetry spans to either transactions or spans, depending on several criteria:
The issue with converting all consumer OTel spans to Elastic APM transactions is that not all consumer spans represent service entrypoints. For example, a message may be actively consumed (e.g. with active polling) within another OTel span. In this case, the outer span should become an Elastic APM transaction, and the message consumption span should become an Elastic APM span.
We require additional information to be recorded in OTLP for this: see https://github.com/open-telemetry/oteps/pull/182