CCI-MOC / ops-issues

2 stars 0 forks source link

Redoing the Switch Management at OCT/MOC #624

Closed hakasapl closed 2 years ago

hakasapl commented 2 years ago

Currently, the way we manage switches is very disorganized. We have some ansible helper scripts, but there are at least 3 different networks switches are managed from (ie. moc, cloudlab, etc.). We do not currently have a centralized "admin" network where we can access all the switches in the network.

In addition, all the current management networks travel over data links. Meaning a loss of a data link also results in a loss of a management link, which forces on-site intervention.

Proposal

What would we need?

For now, that's all we need. We have thousands of ft of bulk CAT6 for the new cables, all we need is the patience to terminate them all.

Other Considerations

Security

Putting everything on a single network has security concerns. The switches themselves should have pub key SSH authentication wherever possible. The network itself should not be routable from the outside world and only reachable from a VPN or via the OCT head node.

Collaboration

This OCT network is used by MOC, NERC, NET2, CloudLab, ESI, Chameleon, Operate First, Fabric, NESE, AL2S, Unity Cluster (UMass), and probably more. We would need to properly document the usage of ansible playbooks and the management of the network in general for the multitude of teams.

In addition, the UMass Amherst networking team has an interest in taking part in the management of this infrastructure - they have their own solarwinds setup for their switches, which are all Juniper.

Routing for Others

Many clusters that use this network have their own respective networks for managing switches. This can continue normally over the data fabric. However, we could institute a routing system where clusters are granted access to certain sections of the management network over a router.

hakasapl commented 2 years ago

Part of milestone: https://github.com/CCI-MOC/ops-issues/issues/631

joachimweyl commented 2 years ago

this became an epic https://app.zenhub.com/workspaces/moc-alliance-backlog-62a210f69d42f600151deae0/epics/Z2lkOi8vcmFwdG9yL1plbmh1YkVwaWMvNDQ5Mg