CCI-MOC / ops-issues

2 stars 0 forks source link

H100 Deployment: Develop Plan for Networking, Evaluate with IBM and confirm BOM #1374

Closed hakasapl closed 3 weeks ago

hakasapl commented 1 month ago

As part of the planning with Lenovo we need to confirm the bill of materials with them. This involves confirming the plan for networking.

hakasapl commented 1 month ago

Initial plan is covered in this diagram:

MOCA Network Diagrams - H100s.png
msdisme commented 1 month ago

Updated date estimates:

hakasapl commented 1 month ago

Diagrams are here: https://lucid.app/lucidchart/81c8e90f-f892-48b1-94ce-e45315b70d8a/edit?viewport_loc=-476%2C-501%2C2222%2C1469%2C2zM~299fSVqR&invitationId=inv_500bd179-8f26-4899-9eec-14ba92f00b91

There are 3 diagrams: First is the MOCA network as it stands today, second is the MOCA network with NERC core switches when they are fully set up. The third is the H100s only and how it connects back to the main diagram.

Some context for the existing network:

The existing MOCA network is split into 3 distinct "islands", where each island belongs to an entity, such as NERC. As the MOC alliance is involved in several projects, this method proved to keep the largely L2 network organized as it grows. Each island is its own spanning tree instance with its own core. The cores connect to each other to share any traffic that needs to traverse islands. This is designed as a hub and spoke topology. The H100 deployment is planned to be a spine leaf topology, which was brought up as a requirement to improve inter-rack bandwidth between GPUs.

To fit this topology into our existing network the plan is to treat the NERC core switches as border leafs in the H100 spine leaf topology. This way the H100 network can behave like a spine leaf while also interacting with the existing network. The main limitation with this approach is bandwidth to the core, which sits at 400G at its current state. This should be enough initially because the NESE (Northeast Storage Exchange) storage being offered externally does not comsume very much bandwidth. This may become an issue later down the road depending on whether new storage offerings will consume more throughput, although at that point it is also possible to add any new external storage to the H100 spine leaf topology directly.

hakasapl commented 1 month ago

PDFs for the diagrams: MOCA Network Diagrams - H100s.pdf MOCA Network Diagrams - MOCA-NERC-NEWCORE.pdf MOCA Network Diagrams - MOCA-NERC-1.pdf

joachimweyl commented 1 month ago

@hakasapl is the plan developed or is there more needed for this issue?

hakasapl commented 1 month ago

I would like to complete the design review before closing this. @hpdempsey is scheduling

hpdempsey commented 1 month ago

Design review is scheduled for 9AM EDT Thurs 10/3.

hakasapl commented 3 weeks ago

Design review was completed. Open questions from the review: