TBD54566975 / ftl

FTL - Towards a š¯¯ŗ-calculus for large-scale systems
Apache License 2.0
19 stars 7 forks source link

Metrics pass over FTL #1988

Open alecthomas opened 2 weeks ago

alecthomas commented 2 weeks ago

FTL needs to be instrumented sufficiently for production. This ticket tracks all the tasks, in the order we should do them in.

bradleydwyer commented 2 weeks ago

Consider how to test this locally without deployment / deployment of FTL.

deniseli commented 2 weeks ago

Initial breakdown of tasks that we'll sync with the team on:

  1. Implement just otel - should run otel collector in a docker container to collect local logs/metrics/traces/spans and show the stream as they come in in the terminal tab. This way, we can run ftl dev, trigger the events we plan to log, and see them come through the terminal window in real(ish) time.
  2. Chat with PFI about whether there's anything they expect to work that isn't. (relates to the next task)
  3. Make sure everything existing actually WAI
  4. Add these attributes to all signals:
    1. project_name: from ftl-project.toml
    2. is_user_service: We have the serviceName set here, which currently will be one of ftl-controller, ftl-runner, or <module name>. Technically, we could use ftl-controller || ftl-runner but it would be good to be a bit more explicit.
      1. Add bool flag isInternal to observability.Config so that Initā€™s callers can configure whether itā€™s user code or not.
    3. trace_id: do we know if the trace_id that otel provides is already instrumented correctly such that it can be used to trace verb-to-verb calls like the trace_id in our request headers, or better yet if theyā€™re actually the same trace_id? If not, then we need to make the one in the request headers available.
      1. Q: should we persist all the header values to attrs? A: we should check if any is done by default. Also, check if thereā€™s any sensitive data (e.g. auth tokens)
  5. Plan out new signals + their attributes (incl. pubsub as described in the ticket desc) (reminder: queue depth)
  6. Change the go-runtime SDK to use a constant string (or configurable?). It's opt-in, so not a huge deal.
deniseli commented 2 weeks ago

Just finished updating the ticket description. Everyone, feel free to assign tasks to yourself. I'll take the first one. Please do not take the two marked low priority until the others are closed (and maybe not even then!)

The original PubSub notes in the ticket description are now in https://github.com/TBD54566975/ftl/issues/2025, along with @bradleydwyer 's reminders (feel free to add more in that ticket ;) )