culturecreates / incident-reports

Reports on incidents in all products and services
0 stars 0 forks source link

2023-05-15 Live 500 error "taxonomyIds is not iterable" #6

Closed dev-aravind closed 1 year ago

dev-aravind commented 1 year ago

Incident Report

Summary

Unable to open the detailed view of most of the events irrespective of the calendar. The issue lasted less than a day. The root cause of the issue was a technical glitch.

Timeline

2023-JUN-14

About 10 hrs between problem was detected and fixed.

[16 h 20]

Email from Signé Laval reporting LIVE failures. Laurent Blais-Sénéchal Chargé de communication - Signé Laval

[20 h 01]

Gregory emails Laurent to acknowledge the issue and say we are working on it.

[20 h 10]

Gregory alerts @everyone in Slack channel. https://culturecreates.slack.com/archives/C02B18SN3FU/p1686787813876419

Gregory starts working to fix problem.

Gregory monitor the backend log in the log management system, datadog, and identified the root cause and identified that the issue is happening while formatting an organization/person linked to an event.

Gregory opened an issue #545 and assigned it to Suhail.

[21 h 42]

Gregory sends email to Laurent at Signé Laval that we have identified the problem and are fixing it.

Suhail analyzed the log and took a deep dive into the issue and found that the issue is related to formatting organization or person with no dynamic fields assigned to it.

Did a hotfix and released a new version of the backend.

[1 h 51]

Suhail releases a fix in production and sends email

Dear Laurent Blais-Sénéchal,

I hope this email finds you well.

I am writing to inform you that we have fixed the bug that was causing API error on event 6488988f371b8f0064914d18. We have identified the root cause of the issue and hot fixed the bug. Also released a new version of our APIs. The new version is tagged [v1.9.6](https://github.com/culturecreates/footlight-calendar-api/releases/tag/v1.9.6).
We have tested the hot fix and the new version of our APIs thoroughly in production and we are confident that they are both working properly.

If you have any questions, please do not hesitate to contact us. Thank you for your continued support.

What went well

List of things that went well. For example,

  1. The recent datadog integration helped us to identify the root cause and save the logs. That helped the backend team to identify the root cause and fix the issue

What went wrong

  1. We have not implemented a unit test that performs end-to-end API testing. Otherwise, this issue could have been identified during the previous deployment.
saumier commented 1 year ago

@troughc Please review this incident report. You can close it when you are done.

dev-aravind commented 1 year ago

@saumier I like the modification you made to the report.

saumier commented 1 year ago

@troughc Please read the report and close.

troughc commented 1 year ago

Read and closed. Thank you for this.