CCI-MOC / ops-issues

2 stars 0 forks source link

Review Two Sigma Donation and Select what we keep and what we give to Flax #1038

Open joachimweyl opened 1 year ago

joachimweyl commented 1 year ago

First, we need to gather the hardware information from the service tags. Spot-checked and decided we don't need to keep any of these. Next, we need to decide if we want to make use of some of the JBODs for a Research Ceph Cluster. @pjd-nu suggested using some of these parts to create the future Research Cluster FX2 - 2-U 4-slot chassis FC630 - 1/2-U blade FD332 - 1/2-U JBOD (16 SFF drives) If we choose to use these we will need to let Flax know sooner than later so that they can bring them to MGHPCC and rack them right away so they don't clutter the floor.

joachimweyl commented 1 year ago

@hakasapl & @naved001 what are your thoughts about using some of this hardware for a research Ceph cluster?

joachimweyl commented 1 year ago

@msdisme who do we need to confirm with that we want Flax to bring back the hardware for the Research Ceph cluster or that we are going to use other hardware for the Research Ceph cluster?

msdisme commented 1 year ago

two sigma knows where to ship them; Sophia in BU Industry Engagement is in talks with two sigma so BU may "take ownership" and get them routed properly. @pjd-nu if you want some of these systems please let us know ASAP so we may arrange for shipping them.

joachimweyl commented 1 year ago

Decision made to send hardware to Flax and if Peter decides he wants some of it back before they process it then we will request just those systems back.

msdisme commented 10 months ago

@pjd-nu regarding drive inventory there is a much wider range of models and sizes in the pool, but the ones that are most obviously large enough drive counts and large enough drive capacities to be useful going forward would be: • 250x units of HDD/10k/SAS/1.2TB - representing 300 TB of total HDD capacity (2.5"/SAS connection) • 100x units of SSD/NVMe/1.6TB - representing 160 TB of total NVMe/SSD capacity (2.5"/NVMe/U.2 connection) • 400x units of SSD/SAS/1.6TB - representing 640 TB of total SAS/SSD capacity 12.57/SAS connection) so just over one petabytes of total capacity - 800 TB if only the flash drives are considered - which is a respectable sized system in aggregate, even by today's standards

msdisme commented 10 months ago

@pjd-nu is there additional info you need to decide which drives to bring over with a system for testing?

pjd-nu commented 10 months ago

Sorry, followed up in slack and I guess I should have added it here. I'd like to get a system with 1 FX2, 2 FC630, 2 FD332, 4 HDD, 4 NVMe, and 16 SAS SSD. (4 SAS SSD would be fine for testing, actually)

er1p commented 10 months ago

good morning

after a bit of experimenting, it seems that the FC630 nodes do not support NVMe drives - there are perhaps model variants that have that electronics, but not the ones we are testing

this means that there are 2x SAS/SATA/2.5" slots inside the nodes, and then 16x SAS/SATA/2.5" slots inside the FD332 units

with a bit of fiddling, it is possible to mount bare drives in the FD332 - without carriers. would not suggest running them like this in production, but would be enough for prototyping work in 2 or 4 units

that means there are a total of 2x + 16x = 18x slots for SAS/SATA drives in the nodes

request above is 4x HDD + 4x NVMe + 16x SSD = 24x total drives

how should we allocate the 18x available slots for this testing ?

most obvious options would be:

but guess it depends on the purpose of the HDDs in the configuration

pjd-nu commented 10 months ago

Yup, I think NVMe is supposed to go directly on the blades, probably only FC830s.

Could you set it up with 18 SSDs and send a loose NVMe drive that we could test with a PCIe adapter? Thanks!

er1p commented 10 months ago

yes, we can do 18x SSDs in total

you want the unit sent / delivered to the university, or out to the data center ?

joachimweyl commented 10 months ago

@pjd-nu ^

pjd-nu commented 9 months ago

University - my office at Northeastern. This had already been discussed.

joachimweyl commented 9 months ago

@er1p ^

pjd-nu commented 9 months ago

My canonical address is:

Peter Desnoyers 360 Huntington Ave Room 202 West Village H Boston, MA 02115

Although I'm actually in Room 334, 440 Huntington Ave (aka West Village H), and the main office folks are in room 202, although they'd just send the delivery up to the 3rd floor and open my office so it could be left there.

If someone's driving it, then it would be best to arrange to meet me - my cell is 617-669-4728 Or I'd be happy to drive and pick it up, as long as it's within an hour or so of Boston.

joachimweyl commented 9 months ago

@er1p says to expect it on Friday the 1st or Monday the 4th.

pjd-nu commented 8 months ago

Now we need to figure out what machines are going to get shipped to Holyoke, and I need to figure out where they'll go, what networking is needed, etc. We're not going to set these up until after 1/16 since we'll need to cannibalize some stuff from the PRB cluster (shutting it down in the process), and we don't want to change anything until after the USENIX ATC paper deadline

joachimweyl commented 8 months ago

Close this issue once all hardware is moved from Flax to MGHPCC for Peter's Ceph cluster.