Chatie / server

Cloud Management Service for Chatie
https://www.chatie.io
Apache License 2.0
3 stars 2 forks source link

Chatie API Server Down Accident Report #73

Open huan opened 3 years ago

huan commented 3 years ago

Token Service Discovery Service Accident

Our wechaty puppet service discovery service has been experiencing out-of-service issues from 11 am Jun 15.

  1. 11 am: out-of-service due to SSL cert expired
  2. 2 pm: we have noticed this problem in the noon then working on it, and found that the 80 ports of the server can not be reached from the public internet
  3. 2:30 pm: the service is back to service by switching to the Heroku Dynos under a downgraded level because we have to use two dynos to serve more than 1,300 concurrency WebSocket connections. You might notice that the token service sometimes returns 404, you can retry 1-2 times to get the right result. (because the token is registered to one server, but not the other)
  4. 10 pm: the service has been moved back to the Azure server by creating a new server, which fixed the 80 port unreachable problem. (it might be related to the azure bug because we can not make the 80 port to be visitable from the internet)
  5. 11 pm: the server fully restored