Integrate Tracing (derived from OpenTelemetry)

CMCDragonkai commented 2 years ago

Specification

OpenTelemetry is an overly complicated beast. It's far too complex to adopt into a logging system. However the basic principles of tracing makes sense. Here I'm showing how you can set one up for comparison testing, for us to derive a tracing schema and later visualise it ourselves or by passing it into an OTLP compatible visualiser.

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.36

The above command runs jaeger. Take note of 4318 port which is the OTLP protocol over HTTP.

Visit localhost:16686 to be able to view the jaeger system.

Then any example code, like for example https://github.com/open-telemetry/opentelemetry-js/blob/main/examples/basic-tracer-node/index.js can run and push traces directly to the docker container.

What is frustrating is:

OpenTelemetry code only exports to stderr as an afterthought, it's not considered first class usage
The stderr exporters output via console.log and produce pretty printed results that are not actual JSON. Thus you cannot just pipe it to a relevant location.
The schema of the span data isn't clear, it seems different parts of the documentation still have old data, or maybe the JS implementation itself is hasn't been updated to the new schema.

The plan:

Create your own "span" derived from opentelemetry and output as just regular structured JSON
Massage it to be compatible to open telemetry viewers like jaeger
Use jaeger's 4318 to stream the JSON and view data in the interim
Find an easier way to visualise traces, maybe something that can be used CLI or in the GUI
For production usage, feed to any structured log capturer, and then feed into a viewer that understands trace information

Additional context

Tasks

...
...
...

CMCDragonkai commented 2 years ago

It seems alot of the complexity is due to the vendors fragmentation and they are trying to make everything compatible.

CMCDragonkai commented 2 years ago

Most tracing tools like https://nodejs.org/api/tracing.html and chrome:://tracing expect a finite dataset, that is expected that a trace has a beginning and end. That's why it's always been "request" driven. Open telemetry is just deriving stuff that came before like in https://github.com/gaogaotiantian/viztracer https://github.com/janestreet/magic-trace https://github.com/kunalb/panopticon and more.

I'm interested in more than just request-driven tracing but live infinite traces (call it continuous tracing that shows finished and live spans at the same time), and correlates them too. I'm guessing we need zoomable levels of detail the ability to filter out irrelevant information dynamically.

Open telemetry in particular does not appear to emit a span until it is done. I'd imagine knowing when a span started even if it did not end yet would be useful for live continuous tracing.

CMCDragonkai commented 2 years ago

Here's an old blog post demonstrating the integration of opentracing to an ES6 promise.

This code is quite outdated, as can be seen by our initial experiments with opentracing, the tracing format isn't exactly what we want, since the spans are only output at the very end, and is not conducive to both live and infinite/non-terminating visualisation.

However the code does show that at one point opentracing was simple enough to be easily extended upon, and one just uses the opentracing core library rather than bringing in so many dependencies now.

CMCDragonkai commented 1 year ago

What we want is something like this:

tracing_viz

The tracing goes from top to bottom, and represents an "infinite" live visualisation of what the current state (lifecycles) of the system is.

MatrixAI / js-logger