edx / edx-arch-experiments

A plugin to include applications under development by the architecture team at edx
GNU Affero General Public License v3.0
0 stars 3 forks source link

Discovery: Observability for events in Production #56

Closed robrap closed 1 year ago

robrap commented 2 years ago

Document the various needs for observability, and make recommendations and ticket what is needed immediately, and what can be deferred.

Some earlier notes on observability:

From https://www.infoworld.com/article/3269207/busting-event-driven-myths.html:

“It is also important to have strong monitoring and observability solutions in place. You need to know which service sent which events, and who is subscribed to these events. Having good visibility into the flow of events will let you understand the system and troubleshoot it with more confidence and less guessing.”

  • How would this work?
  • What details should we capture? Presumably we need an event id.

Also note, although we don't want to fully implement CloudEvents, any fields required for observability that are defined in the OEP for CloudEvents should be used where possible. See https://github.com/openedx/openedx-events/issues/77. This may include an event id, for example, although I'm not clear on whether or not Kafka also includes an event id.

robrap commented 1 year ago

I asked something like the following of both Confluent and New Relic, and will post answers later.

I’m wondering about monitoring our Confluent Cloud usage in New Relic.

robrap commented 1 year ago
robrap commented 1 year ago
  1. Added notes to https://2u-internal.atlassian.net/wiki/spaces/AT/pages/174555142/How+to+Use+the+Event+Bus+edX.org#Observability. Mostly using our runbook to provide additional details.
  2. Separately, we are working on improving CloudEvent headers and getting them into error logs, at a minimum.
robrap commented 1 year ago

I'd like to add a set_custom_attributes call for the message id in the consumer. I asked if that could be done as part of the CloudEvents ticket here: https://github.com/openedx/openedx-events/issues/77#issuecomment-1332761300. If not, maybe add a new task. Either way, maybe this task could be closed.

robrap commented 1 year ago

Created the new task https://github.com/edx/edx-arch-experiments/issues/121 for new custom attributes. Closing this discovery.