cds-snc / notification-planning-core

Project planning for GC Notify Core Team
0 stars 0 forks source link

Investigate Kubernetes 500 Error Alarms #123

Open ben851 opened 1 year ago

ben851 commented 1 year ago

Description

As a developer/operator of GC Notify, I would like to only be alerted when there are actual issues with our system, and not during false alarms so that I do not get alert fatigue and am able to quickly identify real errors.

This card covers the following alerts in the alarm review spreadsheet

WHY are we building?

We are receiving a lot of noise in our operations slack channel that are not indicative of actual issues.

WHAT are we building?

Investigate the kubernetes 500 errors and determine if they can be fixed or if the alarm needs adjustment

VALUE created by our solution

Fewer false alarms will increase developer agility and response times to actual issues.

Acceptance Criteria

QA Steps

sastels commented 1 year ago

In May, 204

HEAD /organisation-invitation/<token> HTTP/1.1" 500 0 "-" "Python-urllib/3.10

one:

POST /contact?current_step=identity HTTP/1.1" 500 
smcmurtry commented 1 year ago

It looks like HEAD /organisation-invitation/<token> HTTP/1.1" 500 0 "-" "Python-urllib/3.10 was probably generated by an external actor. When we invite a user to an organisation, an actual token is generated correctly (we verified this) and the link appears in an email to the invited user, who would then do a GET request.

However, this should not cause a 500 error so we should fix that.

sastels commented 1 year ago

The contact form 500 is likely this issue: https://app.zenhub.com/workspaces/notify-planning-core-6411dfb7c95fb80014e0cab0/issues/gh/cds-snc/notification-planning/756

andrewleith commented 1 year ago
sastels commented 1 year ago

fix for the contact form 500 https://github.com/cds-snc/notification-admin/pull/1578