Open larsks opened 3 years ago
From slack discussion:
5:44 PM
pjd Re: UPS, the Ceph cluster is in the NU racks, right?
5:44 PM
I could ask about moving it to one of the racks on the other side of the aisle, which have backup power.
5:44 PM
All even # racks in aisle 3 pod B have backup
6:02 PM
larsks Something to talk about on Thursday, I guess. I'm going to be out of the office tomorrow.
6:22 PM
msd +1
pjd
All even # racks in aisle 3 pod B have backup
Posted in coredev | Mar 30th | View message
6:27 PM
naved001 well, all our stuff is in the odd racks. and our ceph cluster is distributed across all racks.
6:27 PM
msd yep
6:36 PM
msd assuming you saw the info on research ceph slack channeL?
9:59 PM
naved001 I did see that the research ceph is unhealthy, but that wiill have to wait.
10:55 PM
msd yup
11:36 PM
pjd @naved001 no guarantees, but I’d be willing to do some politicking to see if I could get us half a rack on the even side.
@pjd-nu any feedback from NEU about swapping cages?
need to find out if there is a way to add UPS to existing non-UPS racks, cost structure, space impact?
@pjd-nu and @okrieg is the assumption that in the future the researchers will be responsible for managing the research cephs?
Northeastern has cleared out an even-side rack with backup power that we can use for production Ceph. Ping me if I don't update this with more details...
@pjd-nu I'm pinging you as requested for more details :smile:
@pjd-nu pinging you again for more details.
We have 10 OSD servers (2U per server), and 3 monitors (1U each). That's a total of 23U.
We would need to put a 10G switch with 40g uplinks, and a 1G IPMI switch.
With that we should be able to migrate all the ceph nodes to the new rack.
Just noting down what we talked about it:
We don't have to do any of this during the shutdown. We would want to keep things as is during the shutdown since we already have a failing brocade switch, and more things may break during the poweroff/poweron process.
I've always wanted to try using dual power supplies to move a server without turning it off :-)
@naved001 can you work with @hakasapl and team to come up with a proposal for an MOC ceph cluster that we want to maintain post kaizen shutdown.
@pjd-nu finally we are ready to explore moving ceph to the even side - is it just a single rack and if so can you confirm the location? Do we need to get keys added to the MOC keyring?
not currently an option, pushing to icebox
After the latest MGHPCC power outage, our production Ceph cluster is unhealthy and the OpenStack S3 endpoint is inoperative. This isn't the first time a power outage has taken out our storage. Should we invest in a UPS for the storage cluster?