[DISCOVERY][Packer] Amazon CloudWatch Agent Implementation - Policies and Roles

hgbarreto commented 3 weeks ago

Description

In order to resolve the outstanding issue of EKS node volumes running out of space, we would like to implement the Amazon Cloudwatch agent to collect logs instead of storing them on the node itself.

Need to add new policies to current and new roles:

ec2DescribeTags
(set of permissions to allow writing to amazon cloudwatch)

Possible solutions:

Create new "ec2DescribeTags" policy
Identify or create policy that allows for EC2 instance to write to CloudWatch
Attach new policies to either current roles in use or new roles to be used.

Resources

https://github.com/department-of-veterans-affairs/va.gov-team/issues/74601

Acceptance Criteria

[ ] Setup a Working Session with the team to discuss potential solutions

Refinement Guidance - Check the following before working on this issue:

[ ] Team label assigned ("platform-tech-team-2")
[ ] Epic assigned (if needed)
[ ] Estimated (points assigned)
[ ] Sprint assigned (once planned)
[ ] Team member(s) assigned
[ ] Understands how this work aligns with the overall platform

Efe-Oddball commented 1 week ago

Going through the GH comments in the Closed PR and Slack discussions about this topic to understand what needs to be done

Efe-Oddball commented 1 week ago

The two images that will need to be taken into account here are the

EKS node image
Al2-hardened

Approach_1

Identify nodes or instances that are built off al2 hardened.
Update the roles attached to the IAM instance profile with policies that gives permission to write logs to CloudWatch
Configure logrotation to ensure cleanup.

Approach_2

IAM instance profile role for AL2 hardened is passed directly within the packer file. We can replicate the packer file for the EKS node referencing the instance role profile and passing the required custom inline policy via terraform. (This fixes the issue at the image level.
Implement logrotation

Approach_3

Implement CloudWatch insights at the cluster level for EKS node to handle moving logs to CloudWatch.
Use one of the first two approaches for instances using AL2-Hardened
Implement logrotation

Efe-Oddball commented 1 week ago

Updated some of the EC2 roles with Cloudwatch policies and with test functionality next week before updating the code and creating a PR

Efe-Oddball commented 5 days ago

I have updated all EC2 IAM profile roles within the AWS console. This includes roles connected to the legacy forward proxy, new reverse proxy as well as the new fwd proxy in dev with "ec2:describetags" policy. This also includes adding cloudwatch logging policy to the EKS nodes role IAM profile for all tiers. Working on running the Cloudwatch scripts to set up the Cloudwatch agent, then test functionality

Efe-Oddball commented 5 days ago

I am also working on updating the terraform code with all the updates I implemented within the console

Efe-Oddball commented 3 days ago

I am working on creating a cluster for the EKS updated image with the cloudwatch agent installed and then I will deploy nodes to the cluster and confirm that the logs are going to Cloudwatch correctly. Most of the terraform code that provisions the policies have been updated. This ticket will also have to roll over to the next sprint

department-of-veterans-affairs / va.gov-team