jimleroyer commented 1 year ago

Description

As an ops lead or GCNotify developer, I would like to increase the data retention to 1 year, So that I can properly come back on issues of the past, assess and fix.

At the moment, the retention period for some log group is set to 1 month which is too small for our needs. For example: /aws/containerinsights/notification-canada-ca-staging-eks-cluster/application is set to 1 month.

WHY are we building?

To debug issues that are older than a month.

WHAT are we building?

Increased log capacity in STAGING environment.

VALUE created by our solution

More debugging capacity.

Acceptance Criteria

[ ] All log groups for non-sensitive logs have a retention period of 1 year.
[ ] All log groups for sensitive logs have a the minimal retention period of 7 days.

QA Steps

[ ] Check data retention of app log groups in staging environments.

ben851 commented 10 months ago

@sastels to QA

sastels commented 10 months ago

We do still have a few staging app-related logs that have retention less than a year. Possibly we don't care about them?

Batch Saving 1 week
/aws/lambda/ses_to_sqs_email_callbacks 2 weeks
/aws/lambda/sns_to_sqs_sms_callbacks 2 weeks
/aws/lambda/ses-receiving-emails 3 months

jimleroyer commented 10 months ago

Ben to review Steve's findings and provide Final Judgement on these. (poor souls)

ben851 commented 10 months ago

The 3 lambda ones are because they were part of the terraform module. I've updated the terraform modules repo to allow customizing this setting: https://github.com/cds-snc/terraform-modules/pull/345

ben851 commented 10 months ago

The batch saving log group is not in terraform. I've added it in and set it appropriately. The PR will require manual imports before merging

ben851 commented 10 months ago

New PR created that splits sensitive and non-sensitive log retention periods. Will get the team to review today.

ben851 commented 10 months ago

Need to verify what happened yesterday

ben851 commented 10 months ago

Need to verify what happened two days ago

ben851 commented 9 months ago

Not sure what I needed to verify before... Lesson learned to be more descriptive.

Can confirm that this has been released and is ready for QA.

sastels commented 9 months ago

Steve will take a look!

sastels commented 9 months ago

/aws/lambda/ses_to_sqs_email_callbacks and /aws/lambda/sns_to_sqs_sms_callbacks are both one week as expected for PII
/aws/lambda/ses-receiving-emails doesn't exist in prod and is empty in staging so maybe it's just some old thing that's not used anymore? :/
For the BatchSaving log group I see retention "Never expire" in staging / prod - is this expected?

sastels commented 9 months ago

Probably just needs a terraform variable set - Ben to investigate!

ben851 commented 8 months ago

Ben to actually investigate today.

ben851 commented 8 months ago

Ran into operations issues, will try and get to this today.

ben851 commented 8 months ago

I verified that the callbacks are as expected.

The ses-receiving-emails was empty in staging and didn't exist anywhere else including TF. I deleted the log group in staging.

BatchSaving - I started looking but then had to switch to production release troubleshooting. Will get to this today.

ben851 commented 8 months ago

@sastels to QA

jimleroyer commented 8 months ago

@sastels to QA today!

sastels commented 8 months ago

today. I swear. 95%.

sastels commented 8 months ago

The BatchSaving log group still is "never expire" but it just has metrics in it, and only goes back to 2023-12-04 so maybe it's expiring itself and is fine as it is?

ben851 commented 8 months ago

In prod the retention period is configured to 0, so indefinite so that's expected.... unless this has sensitive info in it? In which case I can change it to the 7 day retention.

The retention period is set correctly in staging, at 12 months... as to why there are only entries from december - that I'm not sure of..

sastels commented 8 months ago

makes sense!

cds-snc / notification-planning-core

Increase log group retention to 1 year in STAGING env #130

Description

WHY are we building?

WHAT are we building?

VALUE created by our solution

Acceptance Criteria

QA Steps