Closed anuveyatsu closed 5 years ago
Unfortunately, we don't have much to analyze here since we've to lost logs when redeployed the services, but we've got the email today from GCE about network vulnerabilities that might be related with this. They say
US-CERT recently disclosed security vulnerabilities CVE-2018-5390 and CVE-2018-5391. These are networking vulnerabilities that increase the effectiveness of denial of service (DoS) attacks against vulnerable systems. All Google Kubernetes Engine (GKE) nodes are affected by these vulnerabilities, and we recommend that you upgrade to the latest patch version, as we detail below.
As a action I've upgraded clusters to 1.10.5-gke.4
versions as was recommened in the emails.
Besides We've build the datahub-health
service https://travis-ci.org/datahq/datahub-health. that runs on schedual daily and notifyies via Email when something goes wrong. It's scheduled for 12:10 PM GMT. As a result we will be aware of something is wrong withing working day.
Closing as FIXED. Feel free to reopen if this comes up again
We've seen that website is down on 31 August around 5:30AM GMT. By taking a look at memory usage of frontend service:
We've just restarted the frontend service to resolve the problem quickly.
Acceptance criteria