Open moderation opened 4 years ago
Probably also depends on the C++ otel library so it can be used in envoy. https://github.com/open-telemetry/opentelemetry-cpp is still in active development and has no releases yet.
Getting closer - RC1 - https://opensource.googleblog.com/2020/10/opentelemetrys-first-release-candidates.html
Announced at re:invent December 2020 - AWS Distro for OpenTelemetry - https://aws-otel.github.io/
As a member of the OpenTelemetry community, I thought I would share that the OpenTelemetry Tracing specification 1.0.1 has been released.
Long term support includes:
Announcement blog: https://medium.com/opentelemetry/opentelemetry-specification-v1-0-0-tracing-edition-72dd08936978
Link to specifications: https://github.com/open-telemetry/opentelemetry-specification
@mattklein123 - Thoughts on next steps for this? I'm happy to coordinate with members of the OpenTelemetry community to help with the implementation.
@gramidt in general the tracing work tends to be driven by vendors and those with a vested interest. We are more than happy to see this work done so please go for it once resources are found! Thank you.
Sounds like a plan, @mattklein123! Thank you for the prompt response.
Is there any progress on this issue?
Hi @CoderPoet -
I'm not aware of any progress here. Is this an immediate priority for you/your team?
@gramidt Did you kick off any conversations to coordinate any work for this? We are interested to see OTLP support from Envoy at least for tracing data so we can position the Otel collector in the export path.
One more thing...
There are two main areas when it comes to OpenTelemetry support:
We can tackle (1) and (2) independently. (2) is possible without having to reinstrument Envoy by linking the OpenTelemetry <-> OpenTracing bridge. Eventually, we can get to (2) to remove the OpenTracing dependency. I think Envoy users can benefit from (1) immediately, and we should prioritize it.
@rakyll - I have had some conversations both internally and externally in the community, but no progress has been made that I am aware of. A recent message from an employee of Adobe says that they may have someone who would be interested in making the contribution for exporting OTLP spans from Envoy.
Has it be considered to break up (2) above into an OpenTracing shim vs a full native OpenTelemetry implementation? One aspect that would solve is context propagation in environments that are currently only using TraceContext (or composite propagators without B3 being one of them), without having to wait for a full OpenTelemetry instrumentation. One could export spans via OTLP (1), but if context propagation still depends on B3 headers, which are becoming less used in new environments, the full value of tracing will not be achieved.
Hi @gramidt,
Has there been any update on this in regards to resourcing the work? We also have a similar interest on the topic for the same reason as @rakyll and was wondering if there is any planned roadmap / expected timeline at this stage.
Thanks!
@Tenaria - Sadly, I have not heard of any progress on this.
@gramidt sorry for a naive question: do we intend to use https://github.com/open-telemetry/opentelemetry-cpp? Last time I check it has a similar approach with the current OC impl. By exporting OTLP spans only, is it converting OC Spans to OTLP's?
Hi @gramidt, I'd be interested in taking on this work!
I've been discussing the potential approach with Harvey (@htuch), and it looks like it may make sense to add a Tracer that uses the OpenTelemetry protos and C++ API. The only potential issue is the stability of the C++ SDK, and depending on how stable it is (I see there was a 1.0.0 release recently), it may make sense to use Envoy's gRPC service instead of the SDK for exporting (similar to what @itamarkam did for the OpenTelemetry logger extension in https://github.com/envoyproxy/envoy/pull/15105).
Either way, still interested in tackling this this :)
Hi all! I guess there hasn't been any movement around here for adding OpenTelemetry support. For some reason i thought that it could be enabled as a dynamic_ot tracer similar to https://github.com/jaegertracing/jaeger-client-cpp but I don't think anybody built one.
Is it safe to say that there is no way to send traces to an opentelemetry collector at this point, yes?
@inquire I don't have the documentation at my fingertips, but OTel receivers can be configured to receive traces/metrics/logs from a variety of different sources, including OpenTracing, Jaeger, and others in addition to OTel's own OTLP. So it's not too hard to emit OpenTracing and send it to an endpoint that's actually OTel.
@kevincantu is correct, I'm trying to setup my sidecar Envoy proxy 1.20 (aws) to report to a OTel collector sidecar using:
ENABLE_ENVOY_JAEGER_TRACING=true
on AWS. My collector is setup for recieving jaeger/zipkin/otel
(AWS defaults to 127.0.0.1:9411)
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
jaeger:
protocols:
grpc:
thrift_binary:
thrift_compact:
thrift_http:
zipkin:
exporters:
otlp:
endpoint: ${tempo_sd}:4317
tls:
insecure: true
sending_queue:
num_consumers: 4
queue_size: 100
retry_on_failure:
enabled: true
logging:
loglevel: debug
sampling_initial: 5
sampling_thereafter: 200
processors:
batch:
memory_limiter:
# 80% of maximum memory up to 2G
limit_mib: 400
# 25% of limit up to 2G
spike_limit_mib: 100
check_interval: 5s
extensions:
zpages: {}
memory_ballast:
# Memory Ballast size should be max 1/3 to 1/2 of memory.
size_mib: 165
service:
extensions: [zpages, memory_ballast]
pipelines:
traces:
receivers: [otlp, jaeger, zipkin]
processors: [memory_limiter, batch]
exporters: [otlp, logging]
no luck so far, I'll keep trying this and update on a workaround (until we ship openTelemetry with envoy)
@mrsufgi one problem with the current jaeger tracer is that it doesn't support the W3C trace context so far. Therefore we enabled the OpenCensus tracer, which is connected to an OpenTelemetry collector, exposing the OpenCensus protocol on port 55678.
istio config:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
meshConfig:
accessLogFormat: >
[%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%"
%RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION%
%RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%"
"%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%" "%REQ(traceparent)%" "%REQ(tracestate)%"\n
defaultConfig:
# https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#Tracing
tracing:
openCensusAgent:
address: "dns:opentelemetry-collector.istio-system.svc.cluster.local:55678" #gRPC specific address
context: # Specifies the set of context propagation headers used for distributed tracing. Default is ["W3C_TRACE_CONTEXT"]. If multiple values are specified, the proxy will attempt to read each header for each request and will write all headers.
- "W3C_TRACE_CONTEXT"
enableTracing: true
values:
global:
proxy:
tracer: openCensusAgent #required to enable the tracer config on the envoy, by default the the zipkin tracer gets used
resources:
requests:
cpu: 10m
memory: 40Mi
@mrsufgi one problem with the current jaeger tracer is that it doesn't support the W3C trace context so far. Therefore we enabled the OpenCensus tracer, which is connected to an OpenTelemetry collector, exposing the OpenCensus protocol on port 55678.
istio config:
apiVersion: install.istio.io/v1alpha1 kind: IstioOperator spec: meshConfig: accessLogFormat: > [%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%" %RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT% %DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% "%REQ(X-FORWARDED-FOR)%" "%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%" "%REQ(traceparent)%" "%REQ(tracestate)%"\n defaultConfig: # https://istio.io/latest/docs/reference/config/istio.mesh.v1alpha1/#Tracing tracing: openCensusAgent: address: "dns:opentelemetry-collector.istio-system.svc.cluster.local:55678" #gRPC specific address context: # Specifies the set of context propagation headers used for distributed tracing. Default is ["W3C_TRACE_CONTEXT"]. If multiple values are specified, the proxy will attempt to read each header for each request and will write all headers. - "W3C_TRACE_CONTEXT" enableTracing: true values: global: proxy: tracer: openCensusAgent #required to enable the tracer config on the envoy, by default the the zipkin tracer gets used resources: requests: cpu: 10m memory: 40Mi
Cool, since I'm using AWS Envoy which doesn't support openCensusAgent
out of the box. perhaps I'll need to use the vanilla envoyproxy or extend aws envoy using ENVOY_TRACING_CFG_FILE
and set it up.
Ill share my results.
I managed to make it work with OpenCensus & OpenTelemetry agent configured to receive OpenCensus and send OpenTelemetry further. This will also connect the traces and pass the context around. Here are my config files.
Config working with envoy 1.20
static_resources:
listeners:
# This defines Envoy's externally-facing listener port
- name: "inbound_listener"
address:
socket_address:
address: 0.0.0.0
port_value:5000
filter_chains:
- filters:
- name: envoy.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
codec_type: auto
stat_prefix: ingress_http
generate_request_id: true
tracing:
custom_tags:
- tag: "k8s_deployment_name"
environment:
name: K8S_DEPLOYMENT_NAME
default_value: local
provider:
name: envoy.tracers.opencensus
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.OpenCensusConfig
stdout_exporter_enabled: false
ocagent_exporter_enabled: true
ocagent_address: localhost:55678
incoming_trace_context: b3
incoming_trace_context: trace_context
incoming_trace_context: grpc_trace_bin
outgoing_trace_context: b3
outgoing_trace_context: trace_context
OTEL Agent config used with otel/opentelemetry-collector:0.24.0
receivers:
otlp:
protocols:
grpc:
http:
opencensus: # you only need this one
jaeger:
protocols:
grpc:
thrift_http:
zipkin:
exporters:
otlp:
endpoint: "opentelemetry-collector:50051"
insecure: true
logging:
loglevel: debug
processors:
batch:
extensions:
pprof:
endpoint: :1777
zpages:
endpoint: :55679
health_check:
service:
extensions: [health_check, pprof, zpages]
pipelines:
traces:
receivers: [otlp, opencensus, jaeger, zipkin]
processors: [batch]
exporters: [otlp, logging]
metrics:
receivers: [otlp, opencensus]
processors: [batch]
exporters: [otlp, logging]
Envoy will ship OpenCensus to the collector which will convert it to OpenTelemetry thus achieving your goal.
I didnt added my solution here yet :) I managed to work envoy issues with AWS Fargate and Appmesh!
First I had to create a custom envoy docker that add config to the image (note that there are many methods for adding configs to images without creating a custom Docker
FROM public.ecr.aws/appmesh/aws-appmesh-envoy:v1.20.0.1-prod
COPY config.yaml /etc/envoy/envoy-tracing-config.yaml
My tracing config for envoy looks like this:
tracing:
http:
name: envoy.tracers.opencensus
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.OpenCensusConfig
trace_config:
max_number_of_attributes: 500
stdout_exporter_enabled: true
ocagent_exporter_enabled: true
ocagent_address: 0.0.0.0:55678
incoming_trace_context:
- trace_context
- grpc_trace_bin
- b3
outgoing_trace_context:
- trace_context
- b3
My otel-collector config is accepting opencensus/jaeger and otlp:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
jaeger:
protocols:
grpc:
thrift_binary:
thrift_compact:
thrift_http:
zipkin:
opencensus:
endpoint: 0.0.0.0:55678
exporters:
otlp:
endpoint: scribe-dev-tempo.scribe-dev-app-sd:4317
tls:
insecure: true
sending_queue:
num_consumers: 4
queue_size: 100
retry_on_failure:
enabled: true
logging:
loglevel: debug
sampling_initial: 5
sampling_thereafter: 200
processors:
batch:
memory_limiter:
# 80% of maximum memory up to 2G
limit_mib: 400
# 25% of limit up to 2G
spike_limit_mib: 100
check_interval: 5s
extensions:
zpages: {}
memory_ballast:
# Memory Ballast size should be max 1/3 to 1/2 of memory.
size_mib: 165
service:
extensions: [zpages, memory_ballast]
pipelines:
traces:
receivers: [otlp, jaeger, zipkin, opencensus]
processors: [memory_limiter, batch]
exporters: [otlp, logging]
(pretty sure I can remove my zipkin)
Just note that my envoy sidecar and my otel-collector sidecar and actual my own api service runs on the same service.
Also, In Fargate you will need to add the ENVOY_TRACING_CFG_FILE
env variable to your envoy task def.
In order for everything to play together I'm using the B3 headers since I'm also using Traefik (which doesn't support trace-context).
Hope it helps as a workaround for now :honey_pot:
@gramidt helpwanted label was removed, but I am not sure somebody is working on it. Could you give a progress update?
I've been working on adding OpenTelemetry tracing and can give a quick update. I currently have a rough draft of adding a new tracer that sends OTLP traces via Envoy's async gRPC client with configurable batching (similar to how the Zipkin tracer works), and I'm planning on wrapping it up and sending a PR after the holidays in early/mid January.
Hello π just wondering if there's been any update on this from the last comment π ? Thanks!
Exciting update! Hi @AlexanderEllis , what's the benefit of your proposed approach versus existing envoy opencensus to open telemetry collector approach?
@ydzhou Hey!
From observation, OC is missing a few features that make it fully compatible with OTel. The noticeable ones that I've found so far are:
I'm sure there might be a few other differences, but these were the two major ones that came to mind during my experience with using/seeing OC.
I'm not entirely sure about the implementation by @AlexanderEllis given I'm not working on it with them, but I assume that if it is going to implement OTel natively, then it will address these issues.
Hi @Tenaria , sorry for the delay on this! Ended up getting preempted by other work (rookie mistake making estimates without a full picture of my Q1 work), but I'm turning back to this and hope to have a PR out soon.
Two additional benefits I'm hoping will be helpful will be 1) the use of Envoy's built-in async gRPC handling, much like the Zipkin tracer does with the thread local httpAsyncClient
(e.g. collector_cluster_.threadLocalCluster()->get().httpAsyncClient
) and 2) built-in batching with a dynamic runtime threshold to allow a little more flexibility with the volume of requests to the collector.
Hi @AlexanderEllis : checking in on this to get a sense of timeline on enabling OTel in Envoy for metrics and traces.
Hi @K-Prabha , thanks for checking in! Hoping to have a PR out by the end of the week, and I'll be sure to link it to this issue as well π
Thank you @AlexanderEllis . Looking forward to it
Hi all, just wanted to give a quick update here.
The bare bones tracer PR has been merged (π ), which represents most of the work to get the tracer in place. There are a few more items to bring this from WIP to alpha (imo), and I'm planning to follow up with a few PRs to implement these (but happy to also review/help if anyone else wants to send PRs as well). A short list includes:
span.setTag
attribute setting (see https://github.com/envoyproxy/envoy/pull/20281#issuecomment-1165736742) tracestate
headersisSampled
on each trace before sending it
I'll be tackling these over the next few weeks, but as I mentioned, happy to guide/review PRs as well!
Hello, it seems this is the main tracking issue for the OTLP exporter? You seem to only offer OTLP/gRPC export for now. Do you also plan to support OTLP/HTTP? OTLP/HTTP is considered the "default" by the OpenTelemetry specification: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.12.0/specification/protocol/exporter.md#specify-protocol
(If you want a more concrete reason apart from spec text, Dynatrace can ingest OTLP, but only via HTTP without gRPC https://www.dynatrace.com/support/help/shortlink/opentelemetry-instrumentation-guide#configure-exporter)
Note that Opentelemetry C++ already supports this exporter: https://github.com/open-telemetry/opentelemetry-cpp/tree/v1.6.0/exporters/otlp#configuration-options--otlp-http-exporter-
OTLP/HTTP is considered the "default" by the OpenTelemetry specification: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.12.0/specification/protocol/exporter.md#specify-protocol
That is true only if the implementation doesn't meet the first SHOULD
SDKs SHOULD support both grpc and http/protobuf transports and MUST support at least one of them
In the case where Dynatrace doesn't support grpc, a workaround it to put an OpenTelemetry collector between Envoy and Dynatrace
Envoy --> grpc --> OpenTelemetry Collector --> http/protobuf --> Dynatrace
@moderation
OTLP/HTTP is considered the "default" by the OpenTelemetry specification: https://github.com/open-telemetry/opentelemetry-specification/blob/v1.12.0/specification/protocol/exporter.md#specify-protocol
That is true only if the implementation doesn't meet the first SHOULD
I'm talking mostly about this paragraph:
If no configuration is provided the default transport SHOULD be http/protobuf unless SDKs have good reasons to choose grpc as the default (e.g. for backward compatibility reasons when grpc was already the default in a stable SDK release).
So even if an SDK supports both HTTP and gRPC, HTTP should be the default. Of course envoy is not an SDK, so this does not formally apply to envoy, but I wonder if there is a particular reason why envoy chose gRPC over HTTP?
Also, to clarify: The default is not my main point, but I think envoy should at least support configuring the HTTP/protobuf exporter.
@Oberon00 Good question! I didn't have a strong preference and chose to start with gRPC with that SHOULD
in mind (knowing that the collector is much more capable than I am at handling multiple formats).
That being said, the design should allow for some pretty approachable extensibility for exporting over HTTP instead βΒ the OpenTelemetry Tracer relies on a OpenTelemetryGrpcTraceExporterClient
(code pointer), and it should be straightforward to add another config option and client for HTTP if the need is there. Much like the OpenTelemetryGrpcTraceExporterClient
uses the built in async gRPC client, the HTTP client could use Envoy's Http::AsyncClient
(for prior art, the Zipkin tracer does something along these lines).
Right now the OpenTelemetry tracer is in place with gRPC, modulo adding some examples, but I think it would be a great follow up to add the HTTP as well. I'm a little hard up for extra cycles at the moment, but I'd be happy to review PRs or chat more about the implementation details in the meantime before I free up more.
Hi, I was wondering what the status of this issue is. I could help with implementation if needed
Hi @esigo ! The OpenTelemetry Tracer is in place and ready to use with gRPC exporting, with at least one improvement as a pending TODO if you're interested (see the above comment about HTTP exporting). Other than that, I think it's mostly tackling any bugs/improvements as it sees use in the wild.
Hi @AlexanderEllis, thanks for the reply. I'll work on HTTP exporting then :)
Hi, folks thanks for the Open Telemetry (OTel) support. It works and we are using it. As documented here we are using something like this:
tracing:
provider:
name: envoy.tracers.opentelemetry
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
grpc_service:
envoy_grpc:
cluster_name: opentelemetry_collector
timeout: 0.250s
service_name: my-envoy
You may be aware that the OTel project is suggesting a few OTEL_
prefixed env vars to drive the config like OTEL_SERVICE_NAME
for service_name
and OTEL_EXPORTER_OTLP_ENDPOINT
for the collector endpoint. We confirmed they work for some custom go
and java
services where we are using OTel SDKs.
Might be a worthwhile enhancement for Envoy along this OTel support journey along with sending metrics along to the same OTel collector endpoint. thanks ππ½
Hi.. is it possible to add additional configuration parameters to the tracer. For example "max_batch_size" which, seems to default to 5
and is much too low.
thank you!
@DanTulovsky yup that's configurable via the tracing.opentelemetry.min_flush_spans
setting: https://github.com/envoyproxy/envoy/blob/main/source/extensions/tracers/opentelemetry/tracer.cc#L148 π
@AlexanderEllis But how do I set that inside envoy given: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/trace/v3/opentelemetry.proto.html
?
There's a problem mentioned in #25347
Is it possible to decouple trace extraction/propagation from actual reporting?
Because now switching to opentelemetry trace reporting breaks propagation between services which still use uber-trace-id
or x-b3-
headers.
Its a shame envoy does not use the opentelemetry-cpp SDK, that has a lot of nice features like being able to choose the propagator. Although I can understand why it was implemented the way it was.
I've been playing around with the OpenTracing Shim in opentelemtry-cpp to create an OpenTelemetry tracer that works with the dynamic_ot tracer extension in envoy. This allows me to use all the SDK features (such as choosing a propagator).
If I ever get this working I will post a link.
@esigo any update on HTTP exporter? Are you still working on it? Thanks!
OpenCensus repos will be archived on July 31 2023 - https://github.com/census-instrumentation/opencensus-proto/commit/1664cc961550be8f3058ddd29390350242f44f1f?short_path=b335630#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5. Definitely time to deprecate the OpenCensus extension
Title: Plan to transition to OpenTelemetry
Description: Envoy currently supports OpenTracing (OTr) and OpenCensus (OC) [0]. In May 2019 it was announced that these projects were merging into the new OpenTelemetry (OTel) project [1]. The original plan was to have the legacy project repos moved to read-only by the end of 2019. That hasn't happened but according to the OTel maintainers they are aiming for beta releases in March [2].
Should Envoy:
https://github.com/envoyproxy/envoy/pull/9955 is planning on adding to the OC capability. The OC service / agent repo says its in maintenance mode and points to OTel/opentelemetry-collector
Relevant Links: 0 : config.trace.v3.Tracing.Http 1 : OpenTracing, OpenCensus Merge into a Single New Project, OpenTelemetry 2 : OpenTelemetry Monthly Update: January 2020