CCI-MOC / ops-issues

2 stars 0 forks source link

MGHPCC 2022 shutdown coming up on May 23rd #562

Closed joachimweyl closed 2 years ago

joachimweyl commented 2 years ago

If we have time we should move Ceph to the rack with backup power. If not we will deal with the Ceph move after NERC transition. @pjd-nu to finalize placement with NU ITS.

joachimweyl commented 2 years ago

Draft Hello Mass Open Cloud users,

Due to a scheduled power outage at the MGHPCC data center, MOC services will be unavailable from Monday, May 23, 2022 at 6PM through Thursday, May 26, 2022 at 12PM.

Please shut down your virtual machines, containers, and any bare metal systems by 6 PM on Monday, May 23rd, so that the Mass Open Cloud team may begin preparing for the outage. If you do not shut them down yourself, you run the risk of losing data.

The MOC has dependencies on several services which also run at the data center. Based on previous experience we recommend not scheduling critical events the week of May 23rd.

We will notify you when MOC services are available by updating the MOC status website at https://status.massopen.cloud/ and by sending an email to this distribution list. Once services are back online it will be your responsibility to restart any virtual machines, containers, or other systems.

If you will need access to any MOC hosted data during this outage, please make sure to obtain copies of that data prior to Monday, May 23rd. During the outage, the data center will be completely without power and access to MOC hosted services will be impossible.

As always, if you have questions feel free to open a ticket at https://support.massopen.cloud. The ticketing system will be available throughout the outage.

Thanks,

Michael Daitzman

PS – If you were forwarded this email from a colleague and you would prefer to receive these notifications directly, you may sign up to the kaizen-users mailing list: https://mail.massopen.cloud/mailman/listinfo/kaizen-users. The MGHPCC will be conducting planned facility maintenance on Tuesday May 25th, 2022.

joachimweyl commented 2 years ago

@larsks @naved001 @knikolla @rob-baron @msdisme please review the email above.

Questions

  1. Should we link to the status page instead of https://massopen.cloud?
  2. Are we Mass Open Cloud Alliance now?
rob-baron commented 2 years ago

looks fine to me .

msdisme commented 2 years ago

yes to status page, also Alliance branding in process, not yet on site. @taramoran, I think we will not be making changes around that by 25th, is that correct?

msdisme commented 2 years ago

text for these generally captured here: https://docs.google.com/document/d/1XT2uGOr0O47gHEfuxFW-6Pm0WSHDgt78ah00p7z3bUA/edit?usp=sharing so we have a copy year over year.

naved001 commented 2 years ago

The email draft looks good to me.

We should definitely NOT move the ceph cluster during this time. It requires more planning, some new switches and it is possible to do it live. This thread has more details.

msdisme commented 2 years ago

@joachimweyl here is the message we put up for last planned outage, with link to a tracking ticket in githu (though mostly not used for tracking looking at it now) https://status.massopen.cloud/2021-07-23-data-center-maintenance/

joachimweyl commented 2 years ago

new status PR

joachimweyl commented 2 years ago

page is live

joachimweyl commented 2 years ago

Awaiting some minor repairs, expect to be good to close this on Monday.

joachimweyl commented 2 years ago

All customer-facing pieces of this are resolved. The status page has been updated.