OpenCHAMI / roadmap

Public Roadmap Project for Ochami
MIT License
1 stars 0 forks source link

DHCP and Netboot Support #47

Open alexlovelltroy opened 3 weeks ago

alexlovelltroy commented 3 weeks ago

OpenCHAMI currently deploys dnsmasq as part of the quickstart recipe with a set of support scripts that populate it from the inventory system. CSM does something similar with kea. We have found value in automated updates of the dhcp services, but the complexities of operating the toolchian have led to unscheduled maintenance events with both systems.

The open source ecosystem doesn’t have much competition in the space of DHCP servers that can accept configuration changes without restart.

OpenCHAMI needs a solution to this challenge that can integrate with our microservices and meet HPC performance needs with some form of high availability. The consortium prefers common solutions that can be drawn from the broader open source community where possible. However, reviewing the options, we haven’t found anything that meets all of our needs.

As part of this objective, we must identify promising options and decide between them:

building plugins for existing systems (like CoreDHCP) embracing and extending existing partial solutions (like netbootd) building an opinionated dhcp server using proven libraries

This objective may include scope for PXE payloads and services for tftpboot or httpsboot if needed. Tools like netbootd and chain loaders like netboot.xyz show us that integration from dhcp all the way through to boot may be more stable than attempting to coordinate multiple services. Work from Kraken on krakenboot shows us that a custom UEFI binary can improve performance and security while also supporting a diverse set of deployment options. Furthermore, cloud-init delivers post-boot information to nodes, but commonly needs some of the same information as the DHCP and netboot services. We need to explore the appropriate boundaries and contracts to allow independent innovation in the area of booting compute nodes without introducing brittle dependencies or integration challenges.

alexlovelltroy commented 2 weeks ago

Implementation will be linked to #50