acmcsufoss / lc-dailies

Daily Leetcode challenges for members to practice their algorithms.
https://acmcsuf.com/lc-dailies
MIT License
2 stars 0 forks source link

Daily webhook unpredictability & double triggers post-mortem #21

Closed EthanThatOneKid closed 10 months ago

EthanThatOneKid commented 10 months ago

Summary

On September 5th, our daily webhook, scheduled for 5pm PST midnight GMT, exhibited erratic behavior. This issue has been resolved, and this post-mortem aims to address the root cause and outline preventive measures for the future.

Incident timeline

  1. Recent workflow removal: A previous GitHub workflow cronjob was removed from production a few days before the incident to validate the new Cloudflare migration.

  2. Unpredictable behavior: During this period, the webhook showed irregularities, with several days of skipped executions.

  3. Double execution on September 5th: On September 5th, the webhook executed twice, causing unexpected behavior.

Resolution

The rogue Cloudflare cronjob has been identified and removed, restoring normal webhook functionality.

Preventive measures

To prevent future errors, we will consider the following suggestions:

  1. Implement stricter controls for cronjob deployment, ensuring thorough review and removal procedures.

  2. Establish monitoring and alerting mechanisms to detect webhook irregularities promptly.

  3. Document and communicate best practices to all team members involved in workflow management.


This issue serves as a record of the incident, its resolution, and our commitment to preventing similar occurrences in the future.

EthanThatOneKid commented 10 months ago

Closed in favor of https://github.com/acmcsufoss/lc-dailies/discussions/22.