Open dettanym opened 1 year ago
Deploying OpenTelemetry (OT): Start with reading a "No Collector" architecture
OT Collector Architecture The OT Collector includes a pipeline for each observability component: logs, traces and metrics. Each pipeline can include multiple implementations of three components: receivers (collect traces), processor (processes received traces --- batching / compressing / tail sampling / modifying spans) and exporters (export processed traces to a back-end). Code for configuring various receivers, processors and exporters is in the opentelemetry-collector-contrib repo.
The OpenTelemetry Operator (OT Operator) includes both the Collector and autoinstrumentation support. In issue dettanym/prose-k8s-home-ops#15, I tried to run the OT Operator but it did not come with a default exporter (?) or a UI to view traces, such as Jaeger. (You are supposed to configure them.)
The OpenTelemetry Demo (OT Demo) includes a small microservices website, and it is meant to demonstrate autoinstrumentation, manually specified spans and the OT Collector. Source code for reference. The Helm Chart for the OT Demo includes an example configuration to expose a UI to view traces (Jaeger).
So I now setup the OT Demo with an exposed UI for the website as well as to view traces. (Once this cluster is deployed following the instructions in the readme, all of the locally exposed sites for the Docker configuration of the OT Demo can now be available at the domain: otel-demo.my-example.com)
Next steps for the OT Demo. Jaeger: The OT Demo includes a Jaeger agent, a collector and a query pod. See the [Jaeger architecture] (https://www.jaegertracing.io/docs/1.45/architecture/) and images. Sidenote: Jaeger K8sOperator and their HotROD example.
The Jaeger UI allows checking for specific tags. (The OT Demo has a page specifying the tags that each service includes.) For example, the email service includes a tag: "app.email.recipient". You can search the UI for "app.email.recipient=jack@example.com" and receive a trace. A good start might be: searching for all traces that include a sensitive substring as a tag name by querying the Jaeger-query service.
Get a Prose pipeline running first on top of these queries, and then look into completeness: examining HTTP request bodies (#22) and tail sampling (#20, dettanym/prose#13).
Interesting sidenote: "Jaeger anonymizer hashes fields of a trace for easy sharing"
Setting up Jaeger / OpenTelemetry, including compatibility with Istio:
Jaeger by itself: Can install either Jaeger or Jaeger Operator via Helm Charts.
Check out the Jaeger architecture options. The direct-to-storage and Kafka options differ in that the collector just uses the memory (direct-to-storage) before dumping to storage OR writes to Kafka, which stores bursty trace traffic and then a Jaeger ingester writes it to DB. The Kafka option might be more useful when incorporating Hindsight (#13 ).
Another option for Jaeger Collectors is to receive OpenTelemetry spans from an OT Collector. The OT Collector itself may either be a sidecar or run as a separate deployment that collects traces from multiple services ("central cluster" on the Jaeger arch site). They mention that an advantage of running the OT Collector as a separate sidecar is that it can enrich the traces with K8s data (see dettanym/prose#12, dettanym/prose#13). Whereas apparently running it as a central cluster has an advantage of "sharding capabilities e.g.. when using tail-based sampling"?
The Jaeger Operator has two deployment strategies: a single all-in-one executable deployment v/s a prod strategy of sidecars.
Jaeger images. Jaeger has a spark-dependencies repo + image that "collects spans from storage, analyzes links between services, and stores them later for presentation in the UI". Relatedly, there is a semi-abandoned Jaeger data analytics repo with Java code that can be used to do complex graph queries on traces.
Istio and Jaeger: Option 1 (dev mode): Istio can install a Jaeger deployment as another addon. Option 2 (prod mode): Install Jaeger separately. Point Istio's Envoy proxies to send traces to the Jaeger endpoint. If we were to setup Jaeger with OT sidecars + Istio, it's unclear whether we can get each proxy to send traces to a local IP:port address.
OpenTelemetry Operator repo with docs, configuration.
The official docs include examples of auto-instrumentation for different languages and in particular, the example for Python walks through different types of instrumentation. A sample demo is also in the official docs, with the architecture etc.
Install Helm charts for the: