StatCan / aaw

Documentation for the Advanced Analytics Workspace Platform
https://statcan.github.io/aaw/
Other
68 stars 12 forks source link

Jaeger: DEV ES v8.x Compatibility #1505

Closed Souheil-Yazji closed 1 year ago

Souheil-Yazji commented 1 year ago

After correctly configuring the ES service thanks to Will, we ran into an issue where the Jaeger Collector, a component created by the operator, was running into the following error: "msg":"Failed to create span writer","error":"elastic: Error 400 (Bad Request): unknown key [template] in the template

After doing some research, it seems that the latest release of Jaeger (1.41) does not yet support ES v8.x+. There is a long-run issue open in the Jaeger repository for this. Also supporting this discovery https://www.jaegertracing.io/docs/1.41/deployment/#elasticsearch, does not include 8.

For reference, our ElasticSearch instance is currently at v8.5.0

According to ES, it should be possible to integrate https://www.elastic.co/guide/en/apm/guide/current/jaeger-integration.html

Jose-Matsuda commented 1 year ago

Leaving a comment here with info I have synthesized from talking with Pat and Souheil. (if the two of you want to have comments leaving what you have learned go ahead but this will be my property to edit.

Attempting to continue to leverage Jaeger

Other options to Jaeger

Leveraging APM

Caveats

Leveraging OTel and APM

OpenTelemetry is not an observability back-end like Jaeger or Prometheus. Instead, it supports exporting data to a variety of open source and commercial back-ends. which we can combine with APM. Using the opentelemetry operator we can set the mode to be in sidecar to inject into our pods.

Leveraging OTel and Jaeger

Zipkin

Maybe I shouldn't be surprised but the storage component for zipkin is also elasticsearch (tested against es 6-7.x no action on this issue for ES 8.3.0) and cassandra

Jose-Matsuda commented 1 year ago

The OpenTelemetry and ES APM pivot (details to be edited in)

Rationale: Jaeger was proving difficult to work with in our environment, and the workarounds needed introduced more components than thought necessary. We are moving in favour of using Elastic APM which seems to provide similar functionality to Jager Because APM does not support using istio traces to get spans, we need span data somehow by either injecting APM Agents or using OpenTelemetry agents With talks of looking at OpenSearch as a path forward we decided on leveraging OpenTelemetry instead of digging too deep into Elastic which does support Otel. Do note however that envoy is looking to send Otel compatible data so possibly in the future we might be able to do away with whatever agent we add at current time of implementation.

Plan Moving forward (Souheil or Pat feel free to edit this section)

Following bullets are to be split into tasks

Jose-Matsuda commented 1 year ago

Closing as completed