TBD54566975 / ftl

FTL - Towards a 𝝺-calculus for large-scale systems
https://tbd54566975.github.io/ftl/
Apache License 2.0
21 stars 7 forks source link

otel: change the go-runtime SDK to use a constant string (or configurable?) #2026

Open deniseli opened 3 months ago

deniseli commented 3 months ago

THIS IS LOOOOWWWW PRIORITY! (because it's both not in use and opt-in)

In go-runtime/ftl/observability/metrics.go, we use the verb name as the metric name, which would result in a seeing a quite high number of metrics. This is not ideal; we don't want FTL to have a reputation for being expensive to instrument logging for.

As an alternative, users are free to invoke the otel SDK directly in their code, and the global otel configuration/setup we've done will apply.

wesbillman commented 2 months ago

@deniseli maybe we should just remove this file for now since it's not being used. We can re-introduce something we're more happy with at that time? I think we're also using the verb name for meter vs. a counter/etc. metric so we're probably ok there. anyway.

deniseli commented 2 months ago

@wesbillman yeah agreed, getting rid of the file for now seems like the right call.

deniseli commented 2 months ago

Current thoughts: it'd be nice to make the go-runtime SDK observability interface more declarative than what otel forces you into. We can simply wrap otel for now and give the users better functions for interfacing with it. For any niche features that we can't support, there's always the backdoor of invoking otel directly in the user code.

Design Thoughts Part 1: Attributes

It'd be nice to pass attributes via struct (so the types are built-in) instead directly using the attribute package that otel provides. Example:

Suppose you're a user, and one of your verbs calls out to a somewhat flaky external API. You want to instrument some metrics on your verb with attributes recording the details of that API call:

type MyVerbAttrs struct {
    failureMode ftl.Option[string] `attr:"failure_mode"`
    retryCount  int                `attr:"my_api_call.retry_count"`
}

func(...) (...) {
    ...
    ftl.RecordCount(ctx, ftl.Metric(...), MyVerbAttrs{...})
    ...
}

The Go SDK could use reflect to produce:

attrs := []attribute.KeyValue{
    attribute.String("ftl.user_defined.failure_mode", "my_failure_mode_1"), // only present when option.ok is true
    attribute.Int64("ftl.user_defined.my_api_call.retry_count.bucket", 0),
}
deniseli commented 2 months ago

Design Thoughts Part 2: Metric instantiation

Meters and metrics don't have to be instantiated at initialization time. In fact, the user code will be cleaner if they don't need to think about that. The SDK could just provide a set of functions to create and manage metrics:

ftl.CounterMetric[T](signalName string, unit, description ftl.Option[string]) (metric.Int64Counter, error)

To support all the main signal types (Counter, Histogram, and Gauge), we'll need to flesh out this interface with some more functions. Options:

Note: Leaving UpDownCounter out for now because if any up or down update is missed, then the observed value would be over or under, which could lead to confusion. Safer to funnel users into using Histo or Gauge.

The meter itself could be instantiated and held as a singleton at the package scope.

deniseli commented 2 months ago

note from standup: this should also build in constraints on attribute cardinality since we get charged for that by datadog