Closed msnwatson closed 2 weeks ago
We'll need to go a slightly separate direction than the ticket had originally planned. It does not look like RabbitMQ supports standard metrics for the time messages sit in a queue.
Rather, I think we need to do two things which is set a TTL on messages in our queues and get metrics and alert on spikes in messages expiring within our queues. This will allow us to still set a termination grace period in a principled way while maintaining visibility into any negative effects of this policy change.
status: still working on resolving the blocker. details added to #3261. in summary: BIP metrics should now be visible; and I see Wednesday Aug 7 as a stretch goal for getting the metrics on the other apps working.
status: last round of changes on the blocker expected to be deployed 8/12.
status: I believe the blocker has been addressed.
Request duration is being logged under these metrics: vro_xample_workflows.request_duration
, vro_bie_kafka.request_duration
, vro_bip.request_duration
Per this thread: https://dsva.slack.com/archives/C04QLHM9LR0/p1723848302659089 I took a slightly different approach than mentioned in the ticket based on the metrics that I was seeing. Still blocked on getting RabbitMQ metrics.
User Story
As a VRO engineer, I want to take action on the findings from #2816 and close any availability or monitoring gaps, so that the platform's ability to offer service uptime for partners is improved.
Acceptance Criteria