department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
281 stars 197 forks source link

Environment (and other) tags missing during revproxy deploys #75159

Closed lindsay-insco closed 1 day ago

lindsay-insco commented 7 months ago

Issue

Adrian Rollett had discovered some unusual behavior that occurs during revproxy deploys and requested a deeper investigation and resolution.

Image

Acceptance Criteria

Additional information

Further information in this slack thread here


How to configure this issue

hgbarreto commented 6 months ago

thread: https://dsva.slack.com/archives/CBU0KDSB1/p1706305882259029

ph-One commented 2 months ago

The root cause is that the EC2's monitoring software is terminated during the shutdown process and gives produces a void in our logs for the reverse proxy itself during deployment. Looking at the logs for the ELB that fronts the reverse proxy we do not see the dip. We're still looking into ways to improve this, such as, delaying the cut-over between new and old deployments until after logging has had time to warm up and begin populating Datadog.

Attempts to keep the monitoring and logging at the instance level alive as long as possible appears to have improved the situation ever so slightly, but the situation still exists.

ph-One commented 1 day ago

I've setup the Datadog Agent to start as early as possible (after docker.service), which has helped this situation. The dip is still going to happen bc during shutdown the logging agent can be terminated before the application, but we have at least made it so that the logging agent is started as early as possible, and before the reverse proxy itself, which is why we see the slow build-up of logs as the new instances take over.

Before

image

After

image

ph-One commented 1 day ago

This is likely our best outcome while using a logging agent. We also have the reverse proxy ELB logs, which are continuous and do not suffer from this dip

image

ph-One commented 1 day ago

I do not believe there is any more work that can be done here. It is the nature of a local logging agent. Closing, but please reopen (or generate a new ticket) if a new approach is discovered.