golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.23k stars 17.7k forks source link

x/build: observability using distributed tracing and metrics #26779

Open odeke-em opened 6 years ago

odeke-em commented 6 years ago

I am coming here from https://groups.google.com/forum/#!msg/golang-dev/MdwFiAx5-PU/UiUvY-8_DwAJ

The OpenCensus project https://opencensus.io/ provides observability into distributed systems(monoliths and microservices alike) by providing mechanisms to record traces and metrics. Those signals help provide insight into the states of a distributed system.

I presented a talk about OpenCensus at GoSF on 18th July 2018(about 3 weeks ago) and I posted the accompanying slides here https://cdn.rawgit.com/orijtech/talks/master/2018/07/18/gosf/gosf.htm#1 or better https://github.com/orijtech/talks/blob/master/2018/07/18/gosf/gosf.slide for the Go present slide

The value of it

Traces can help give play-by-play action/visibility into the state of sampled requests e.g. we can see that invoking os/exec took this long while fetching metadata from Google Cloud Storage took this long https://cdn.rawgit.com/orijtech/talks/master/2018/07/18/gosf/gosf.htm#14

The metrics that are collected are useful to actively check the health of the system e.g. send alerts to the x/build authors when a trybot run takes say 8 minutes or when overall the p99th latency hits 10 minutes.

Maintenance and technical debt

In regards to maintenance, the OpenCensus Go implementation https://github.com/census-instrumentation/opencensus-go implements the tracer, metrics, and we just use the packages to instrument our code e.g excerpted from my slides https://cdn.rawgit.com/orijtech/talks/master/2018/07/18/gosf/gosf.htm#13

func search(w http.ResponseWriter, r *http.Request) {
    ctx, span := trace.StartSpan(r.Context(), "Search")
    defer span.End()

    // Use the context and the rest of the code goes below
    _ = ctx
}

To extract out data, we just need to add an "exporter"/liason-to-our-backend of choice in a main function for example to send traces to Stackdriver

package main

import (
    "log"

    "contrib.go.opencensus.io/exporter/stackdriver"
    "go.opencensus.io/trace"
)

func main() {
    sd, err := stackdriver.NewExporter(stackdriver.Options{ProjectID: "census-demos"})
    if err != nil {
        log.Fatalf("Failed to register Stackdriver Trace exporter: %v", err)
    }
    trace.RegisterExporter(sd)
}

Maintenance work is detached from the Go project, since the OpenCensus project is staffed already with collaborators from a wide range of companies. The Go project only needs to import the respective libraries, start and stop traces as well as record metrics and finally create exporters of the desired backend e.g. Prometheus, Zipkin, AWS X-Ray, Jaeger, Stackdriver Tracing and Monitoring, SignalFx etc.

Next steps

I finally got some dev cycles this quarter to help work on improving our build system but I also would be delighted to delegate/work with people in the community too -- hence why I am filing this right now.

/cc @basvanbeek @ramonza @bogdandrutu @rakyll @kevinburke

gopherbot commented 6 years ago

Change https://golang.org/cl/138522 mentions this issue: cmd/coordinator: use OpenCensus for Stackdriver metrics

gopherbot commented 6 years ago

Change https://golang.org/cl/138523 mentions this issue: cmd/coordinator: initial tracing and metrics using OpenCensus

gopherbot commented 3 years ago

Change https://golang.org/cl/303669 mentions this issue: cmd/coordinator: migrate to OpenCensus for metrics