department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
281 stars 197 forks source link

Discovery: Integrate VetsAPI with Event Bus #88950

Open jennb33 opened 1 month ago

jennb33 commented 1 month ago

User Story

As the Managers and Developers on VA Platform, We need to do discovery on the proposed integration of VetsAPI with Event Bus, So that there is a strategy for integration, which would forward selected log messages to the Enterprise Event Bus (Kafka) in support of VA-wide transaction tracing.

Issue Description

We in VES are working on making veteran submissions traceable end-end across VA by instrumenting various systems to forward key events to Kafka, and from there to the CX Insights Data Warehouse where they can be aggregated/analyzed. VA.gov is a key originating point for veteran submissions! We've talked with various folks like Steve Albers, Patrick Bateman, and Bill Chapman and they are supportive of the idea of instrumenting vets-api in this way. Why FluentD? While we could have vets-api do a direct connection to Kafka, this would be a bigger implementation burden for teams, and a runtime dependency - what if Kafka is unavailable, is the logic for issuing an event the same from a controller as from a sidekiq job, what about retries, etc. Instead we want to source events from logs - this reduces the burden for app teams to "when something important happens, log a message with this specific format" (which we can provide a rails utility method to perform). FluentD can be configured to pick up only those log messages and convert them to Kafka events and they will flow where they need to. We have the healthcare application team ready to work on this, we just need to tell them what format of log message to generate, and then we need the FluentD-Enterprise Event Bus connection configured, presumably by platform. FluentD is in the TRM and had been used by platform as an approach for log forwarding in the past (when logs were going to Loki), I think they wound up not needing it when the switch to Datadog was decided upon.

Tasks

Success Metrics

That there is an outlined and approved strategy for integrating VetsAPI with the Event Bus, so that there is a path forward regarding this objective.

Acceptance Criteria


Validation

Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.

jennb33 commented 1 month ago

Per @LindseySaari we need to coordinate this work better with Patrick V to ensure that the right teams are assigned to the work. We may be shifting from FluentD to AWS Lambda.