Couchers-org / couchers

The next-generation couch surfing platform. Free forever. Community‑led. Non‑profit. Modern. Chuck us a star :)
https://couchers.org
MIT License
372 stars 78 forks source link

Multiple backend errors across entire website; Multiple APIs broken #2253

Open jesseallhands opened 2 years ago

jesseallhands commented 2 years ago

This seems to be a big issue. You cannot search nearly anything or bring up profiles at all. Seems that most APIs are broken:

image

jesseallhands commented 2 years ago

@lucaslcode @aapeliv @ShmuelTreiger @darrenvong This is a critical issue affecting nearly all areas of the website and affecting nearly all functionality of the website!

jesseallhands commented 2 years ago

Issue was that we ran out of disk space. Lucas expanded space as a temporary solution. Thus the issue seems to be resolved as of 01:54 UTC. A postmortem is still needed.

darrenvong commented 2 years ago

Can we close this? Or are we waiting until the post-mortem has been done first?

jesseallhands commented 2 years ago

I was waiting for the postmortem, but maybe that should be it's own ticket since we don't really have a postmortem protocol yet?

The reason a postmortem is so important in this case is that this outage seems to have been fully preventable. If the issue was exceeding a storage limit, we should have had an automated alarm that notified us well in advance of us hitting the limit. If there was an alarm that just didn't work, we should find out why.

darrenvong commented 2 years ago

The reason a postmortem is so important in this case is that this outage seems to have been fully preventable. If the issue was exceeding a storage limit, we should have had an automated alarm that notified us well in advance of us hitting the limit. If there was an alarm that just didn't work, we should find out why.

I agree with everything that you said there - it's all about learning from this and see if we can do better next time to refine our processes!

And I feel like maybe it should be a separate issue, as post-mortems tend to be documented as something we can refer back to in the future and this repo tends to only have docs related to how the code works. But we also have meeting notes in a docs folder so could be argued that postmortems could be here too 🤷

Although things might be better structured/easier to find if it's saved on our wiki instead? Devs (certainly myself) tend to use GitHub a lot more though, so I don't have any objections of them being in docs either. Would be interested to hear what others think too @aapeliv @lucaslcode ?

aapeliv commented 2 years ago

Yes, plan has been to write postmortems; but we had no meaningful outages so we forgot about it!