Open alexlovelltroy opened 2 months ago
Before evaluating our options to determine which solution to pursue, I think it would be useful to determine what needs should be prioritized and what sort of architecture we want.
Please correct/clarify anything that's incorrect.
The need seems to be to have a DHCP server that behaves like a microservice and:
With a plugin, CoreDHCP could achieve 3 (as demonstrated by the included netbox plugin). However, the way that CoreDHCP plugins work is that each plugin registers handlers for DHCPv4 and DHCPv6 traffic and then these handlers are called in sequential order for each packet that reaches CoreDHCP. Because of this, implementing a configuration API (1) would be impossible so mapping IPs to MAC addresses would have to occur within the handler function. This could work for NICs of nodes, but flexibility is limited if dealing with other things like BMCs (that don't require a boot configuration). This is opposed by something like netbootd manifests, which allows per-host configuration.
CoreDHCP stores its configuration as a YAML file, so a file-based storage backend like Docker volumes or Kubernetes ConfigMaps. The dnsmasq-dhcpd container also used volumes, but the CoreDHCP configuration file is a single, static file that doesn't need dynamic updating so this is likely less of an issue. The static config is customizable (5), though the file itself needs to be accessible to be edited.
CoreDHCP does not include a PXE/TFTP stack (4) and I could not locate a plugin that managed this.
Netbootd has an API that allows fetching/setting/deleting node configuration (called "manifests" in the project), though it stores configuration per-node and has no global configuration file (1,5). One caveat is that netbootd will only respond to known MAC addresses, so if functionality for dealing with unknown MAC addresses is desired, this will need to be added. As stated, configuration is file-based so, without any modification, the storage backend would need to be file-based (2).
In order to be able to communicate with SMD (3), netbootd will need to be modified (though this is trough for both of these solutions). Netbootd uses static IPs within the configuration file, so it might also need to be modified to use IPs from SMD instead. Netbootd has an existing API that can potentially be modified to be notified for new changes, or, if going with a polling model, can be modified to poll SMD instead.
Finally, it has its own PXE/TFTP stack (4), which reduces its need to depend on other software.
It seems that, of the two options above, netbootd has more functionality already implemented that meets the stated needs than CoreDHCP. It is self-contained without the need for external dependencies, seems like it will require less work to get working the way we want, and already follows a cloud-like design.
But, if others disagree or know of a more appropriate solution that already exists, I am not married to this opinion and can be convinced otherwise. It is always nice not to have to maintain code. 😉
I have a few comments.
First, let's check the assumptions both implicit and explicit in the five requirements you list.
Second, I'd like to dig in further with your assessment of both options.
I don't agree with the limitations you list for CoreDHCP. Given the nature of the plugins, I see no reason that a CoreDHCP plugin couldn't maintain an internal cache of all MAC<-->IP mappings and update them asynchronously from SMD or even expose that cache for API update. The only configuration information that needs to go in the file is the SMD url and perhaps a secret for use in pulling data.
Having said that, CoreDHCP does only handle DHCP and wouldn't address the other needs for HTTP or tftp. So, we'd need to have different services to handles those things. I wonder what the overlap is between the SMD client information needed for cloud-init and the SMD client information needed for dhcp?
Your assessment of netbootd makes sense to me. The risk I see in it is the degree we'll need to customize it for our needs and the degree of collaboration we can expect from the netbootd author/community. Do you feel confident that we can manage integrating this service and contributing back? What's the risk that we end up maintaining an incompatible fork? Since netbootd is a thin wrapper around some go libraries, should we consider building our own solution which is similar to netbootd, but more targeted at our use cases?
We may want to explore the netbox plugin for inspiration for our SMD plugin.
@alexlovelltroy regarding your response:
dnsmasq.conf
can be loaded, but the loader hard-codes the DHCP options.)[^cache]: We'll have to be careful with cache implementation because introduction of a cache could reintroduce the config latency.
Looking back at my evaluation after your comments, I can see how using CoreDHCP as a base and writing a plugin, relegating TFTP (and HTTP if OpenCHAMI supports HTTP booting in the future) to separate services, would comparmentalize the amount of custom code that we would need to maintain and allow us to plug functionality. I think it's worth attempting to see what can be done. I'll see if I can put something together for this.
I have created two plugins for CoreDHCP (living in one repo) that communicate with SMD for DHCP requests:
https://github.com/OpenCHAMI/coresmd
There have been recent additions to SMD (https://github.com/OpenCHAMI/smd/pull/34, https://github.com/OpenCHAMI/smd/pull/35) and Magellan (https://github.com/OpenCHAMI/magellan/pull/62) that allow BMC information (e.g. MAC and IP addresses) to be stored in SMD on discovery using Components and EthernetInterfaces, allowing them to be queried. This means that these data can be queried in SMD to see if a MAC address requesting an IP is known to SMD. The coresmd plugin hands out DHCP leases to SMD-known MAC addresses.
There is also the bootloop plugin, which is like CoreDHCP's "range" plugin except that it:
These steps ensure that nodes/BMCs/etc. unknown to SMD will get temporary IP addresses, but will constantly be requesting new ones in the case that they become known to SMD. This also allows BMCs to be discovered dynamically so that they can be easily added to SMD.
For admins that care about which IPs are assigned to which MACs (e.g. for BMCs), CoreDHCP's "file" plugin is useful for this.
There is an example configuration that includes how to use these plugins in a CoreDHCP minimal working config file.
As stated in https://github.com/OpenCHAMI/deployment-recipes/issues/3, we know that dnsmasq will only take us so far with this project. We'll need something else to take over. We learned a lot from the work on dnsmasq that we can apply to the future.
I propose that there are two viable options here for how to proceed.
We create plugins for coredns and coredhcp to handle our internal dhcp and dns. There is an example of a plugin from five years ago in the coredhcp repository that we could start from Netbox Plugin.
We adopt, and potentially extend, netbootd as our standalone bootscript server and use configurator to generate the appropriate manifest file(s)
It may make sense to entertain other options, but we need to decide and move fairly quickly as part of our LANL objective for 2024/25 https://github.com/OpenCHAMI/roadmap/issues/47