OpenFabrics / fsdp_docs

Other
2 stars 3 forks source link

Need to update fsdp_setup scripts #135

Open dledford opened 9 months ago

dledford commented 9 months ago

When adding the NVMe drives, we changed what cards are installed in nodes 01 and 02, and also removed the bifurcating PCI-e card from the Mellanox cards in nodes 09 and 10. We need to update the machine definition files in the fsdp_setup scripts to compensate.

PatrickRobbIOL commented 3 months ago

Lincoln suggested we simply create an ansible directory at the top of https://github.com/OpenFabrics/fsdp_setup

instead of making an entirely new directory. This is mainly because there isn't a technical reason for a 2nd repo as opposed to a simple directory, working solely within fsdp_setup allows us to remove/add components from the bash script and to the ansible setup script in 1 PR, and because this repo is already automatically checked out to the cluster hosts.

Does this sound okay?

dledford commented 1 week ago

Per the network manager team:

Device renaming requires udev rules (NM doesn't do any renaming)

Creating persistent state can be done in one of two ways:

Drop a yaml file in /etc/nmstate and enable the nmstate.service

Wipe out and create new files in /etc/NetworkManager/system-connections/ to represent the devices