Closed therobyouknow closed 2 years ago
sites back up. Thanks to Ben for helping here.
Will record some findings to help avoid future similar.
From our infrastructure team (TS):
Yesterday [Tuesday 28 June 2022] there was an issue with the NHM storage solution which caused many of our NFS clients lose access to storage mounts and as a result the scratchpad servers would have errored.
The service came back online around 15:30 yesterday and is stable, please le me know if you have any recent issues.
The above would explain the issue.
I would think that expectation is that it is not a recurring issue and also there isn't anything ourselves as scratchpads maintainers would need to do.
also happened on Wed 29 Jun. Ongoing issue.
logs show:
looks like the scratchpads site outage is due to a problem with the database servers
[Wed Jun 29 11:30:38.791916 2022] [php7:notice] [pid 21714] [client 157.140.2.32:36898] PHP Notice: Undefined index: port in /var/aegir/config/includes/databases.inc on line 13, referer: https://vbrant.scratchpads.org/calendar-date/2021-11-12?destination=forum%2F2
email from monit to say its trying to restart mysqld on sp-data-03.nhm.ac.uk