OpenCHAMI / roadmap

Public Roadmap Project for Ochami
MIT License
1 stars 0 forks source link

[FEATURE] Replace dnsmasq for dhcp. #50

Open alexlovelltroy opened 2 months ago

alexlovelltroy commented 2 months ago

As stated in https://github.com/OpenCHAMI/deployment-recipes/issues/3, we know that dnsmasq will only take us so far with this project. We'll need something else to take over. We learned a lot from the work on dnsmasq that we can apply to the future.

I propose that there are two viable options here for how to proceed.

  1. We create plugins for coredns and coredhcp to handle our internal dhcp and dns. There is an example of a plugin from five years ago in the coredhcp repository that we could start from Netbox Plugin.

  2. We adopt, and potentially extend, netbootd as our standalone bootscript server and use configurator to generate the appropriate manifest file(s)

It may make sense to entertain other options, but we need to decide and move fairly quickly as part of our LANL objective for 2024/25 https://github.com/OpenCHAMI/roadmap/issues/47

synackd commented 1 month ago

Before evaluating our options to determine which solution to pursue, I think it would be useful to determine what needs should be prioritized and what sort of architecture we want.

Please correct/clarify anything that's incorrect.

What do we want in a DHCP service?

The need seems to be to have a DHCP server that behaves like a microservice and:

  1. is API-driven to set/edit/delete its configuration
  2. uses an orchestrator-agnostic storage backend for said configuration
    • e.g. like the way BSS can use either Etcd or PostgreSQL
    • versus in-memory
  3. communicates with SMD for node info, either by polling or being notified at an API endpoint
    • e.g. MAC, IP, XName
    • IP-MAC mapping determined from this data
    • Polling model seems the trend for recent Ochami development instead of notification API
    • What about unknown IPs? BMCs that don't need netboot? Ignore versus "default pool"
  4. includes PXE/TFTP stack
    • and optionally HTTP
  5. supports customizable configuration
    • current dnsmasq-dhcpd container doesn't allow for custom config

Evaluating Current Options

CoreDHCP + Plugin

With a plugin, CoreDHCP could achieve 3 (as demonstrated by the included netbox plugin). However, the way that CoreDHCP plugins work is that each plugin registers handlers for DHCPv4 and DHCPv6 traffic and then these handlers are called in sequential order for each packet that reaches CoreDHCP. Because of this, implementing a configuration API (1) would be impossible so mapping IPs to MAC addresses would have to occur within the handler function. This could work for NICs of nodes, but flexibility is limited if dealing with other things like BMCs (that don't require a boot configuration). This is opposed by something like netbootd manifests, which allows per-host configuration.

CoreDHCP stores its configuration as a YAML file, so a file-based storage backend like Docker volumes or Kubernetes ConfigMaps. The dnsmasq-dhcpd container also used volumes, but the CoreDHCP configuration file is a single, static file that doesn't need dynamic updating so this is likely less of an issue. The static config is customizable (5), though the file itself needs to be accessible to be edited.

CoreDHCP does not include a PXE/TFTP stack (4) and I could not locate a plugin that managed this.

Extended netbootd

Netbootd has an API that allows fetching/setting/deleting node configuration (called "manifests" in the project), though it stores configuration per-node and has no global configuration file (1,5). One caveat is that netbootd will only respond to known MAC addresses, so if functionality for dealing with unknown MAC addresses is desired, this will need to be added. As stated, configuration is file-based so, without any modification, the storage backend would need to be file-based (2).

In order to be able to communicate with SMD (3), netbootd will need to be modified (though this is trough for both of these solutions). Netbootd uses static IPs within the configuration file, so it might also need to be modified to use IPs from SMD instead. Netbootd has an existing API that can potentially be modified to be notified for new changes, or, if going with a polling model, can be modified to poll SMD instead.

Finally, it has its own PXE/TFTP stack (4), which reduces its need to depend on other software.

Conclusion

It seems that, of the two options above, netbootd has more functionality already implemented that meets the stated needs than CoreDHCP. It is self-contained without the need for external dependencies, seems like it will require less work to get working the way we want, and already follows a cloud-like design.

But, if others disagree or know of a more appropriate solution that already exists, I am not married to this opinion and can be convinced otherwise. It is always nice not to have to maintain code. 😉

alexlovelltroy commented 1 month ago

I have a few comments.

First, let's check the assumptions both implicit and explicit in the five requirements you list.

  1. We definitely need the DHCP server to respond to external configuration changes, but does that mean the microservice itself must have an API? I can see arguments on both sides for this. Can we consider our dhcp server to be stateless and pull all operational information from SMD on startup and update it periodically?
  2. If our DHCP server can recreate state on startup, do we need to support any backend storage at all? What about checkpoints and/or recovering cache on startup if SMD is unavailable?
  3. Good points here. I think getting more concrete on these points would be good. What do you think? What options do you think should be available for unknown MACs? Is it important to be able to serve different information to different MACs in the same pool? How dynamic do we want the system to be?
  4. If we don't include TFTP and HTTP in the same service, what can we do instead? What tradeoffs do you think we should consider?
  5. How is this different from 1. ?

Second, I'd like to dig in further with your assessment of both options.

I don't agree with the limitations you list for CoreDHCP. Given the nature of the plugins, I see no reason that a CoreDHCP plugin couldn't maintain an internal cache of all MAC<-->IP mappings and update them asynchronously from SMD or even expose that cache for API update. The only configuration information that needs to go in the file is the SMD url and perhaps a secret for use in pulling data.

Having said that, CoreDHCP does only handle DHCP and wouldn't address the other needs for HTTP or tftp. So, we'd need to have different services to handles those things. I wonder what the overlap is between the SMD client information needed for cloud-init and the SMD client information needed for dhcp?

Your assessment of netbootd makes sense to me. The risk I see in it is the degree we'll need to customize it for our needs and the degree of collaboration we can expect from the netbootd author/community. Do you feel confident that we can manage integrating this service and contributing back? What's the risk that we end up maintaining an incompatible fork? Since netbootd is a thin wrapper around some go libraries, should we consider building our own solution which is similar to netbootd, but more targeted at our use cases?

alexlovelltroy commented 1 month ago

We may want to explore the netbox plugin for inspiration for our SMD plugin.

https://github.com/coredhcp/plugins/tree/master/netbox

synackd commented 1 month ago

@alexlovelltroy regarding your response:

  1. I would say that if we can pull all operational information from SMD (which I think is preferred), then the DHCP server can be stateless and it can interrogate SMD for requests. This would probably eliminate the need to have a configuration API, unless it would be useful to fetch the current DHCP configuration information (e.g. lease data) from the server. (I would anticipate a CoreDHCP config being static, but is there anything we would need to change without having to modify the file and restarting the service?)
  2. If the server is stateless, then the only thing that would need to be stored is the static config which, if using containers, can be mounted in. I do like the idea of caching data in the case SMD is down (and also to reduce unnecessary network traffic). The question of recovering cache is an interesting one. BSS currently will boot loop a component if it cannot find it within SMD (which it will not if SMD is down), so would it be worth it to have a persistent cache to use if SMD went down? One possible case I could see is if needing to provide IP addresses to BMCs (e.g. DHCP is down and the BMC's DHCP lease expires, preventing power/console access).
  3. I think some systems folks like @travisbcotton or @njones-lanl would be able to offer helpful input on this.
    • Unknown MACs: We've seen with dnsmasq-dhcpd that latency with the generation of the opts/hosts files has lead to known nodes getting a random IP from the pool and default options if those files haven't been generated yet, which causes them to fail to get the iPXE bootloader and thus fail to boot. I could see the CoreDHCP + plugin option possibly solving this by querying SMD (or cache[^cache]) for each request. Theoretically, when adding or changing out a node, one would update SMD and so the new or changed MAC address would become known and the old one (if applicable) would become unknown, eliminating the need to handle unknown MAC addresses. However, there may be use cases to enable handling of unknown MAC addresses such as if one doesn't care what IPs get mapped to which MAC addresses.
    • Dynamic System: The main case I can see here is serving DHCP requests to BMCs, Ethernet, and HSN, with the main distinguishment being which to serve boot options to. I can see this being important for sites who rely on DHCP for network most configuration. In the past, the SI did this as well, using DHCP for both management interfaces and BMCs. Someone with more experience than I is welcome to chime in here, though.
  4. Separating TFTP/HTTP into their own services would give administrators the flexibility to choose a server and help narrow down troubleshooting issues. Though, particularly for TFTP, this service is only used for PXE booting and so it would be more of a "sidecar". Functionally though, there's probably little difference besides setting up the communication infrastructure between the services. (More experienced input here is welcomed as well 🙂)
  5. I can see that I was a little vague on this... What I meant with 1 was being able to modify a configuration specifically via an API. What I meant by 5 was being able to configure things in the first place (e.g. DHCP options). (I'm looking at dnsmasq-dhcpd now and see that a custom dnsmasq.conf can be loaded, but the loader hard-codes the DHCP options.)

[^cache]: We'll have to be careful with cache implementation because introduction of a cache could reintroduce the config latency.

Looking back at my evaluation after your comments, I can see how using CoreDHCP as a base and writing a plugin, relegating TFTP (and HTTP if OpenCHAMI supports HTTP booting in the future) to separate services, would comparmentalize the amount of custom code that we would need to maintain and allow us to plug functionality. I think it's worth attempting to see what can be done. I'll see if I can put something together for this.

synackd commented 2 weeks ago

I have created two plugins for CoreDHCP (living in one repo) that communicate with SMD for DHCP requests:

https://github.com/OpenCHAMI/coresmd

There have been recent additions to SMD (https://github.com/OpenCHAMI/smd/pull/34, https://github.com/OpenCHAMI/smd/pull/35) and Magellan (https://github.com/OpenCHAMI/magellan/pull/62) that allow BMC information (e.g. MAC and IP addresses) to be stored in SMD on discovery using Components and EthernetInterfaces, allowing them to be queried. This means that these data can be queried in SMD to see if a MAC address requesting an IP is known to SMD. The coresmd plugin hands out DHCP leases to SMD-known MAC addresses.

There is also the bootloop plugin, which is like CoreDHCP's "range" plugin except that it:

  1. Serves an iPXE boot script that reboots.
  2. Responds to DHCPREQUESTs of known MAC addresses (i.e. they already have an IP and are retrying to renew) with a DHCPNAK to tell them to restart the entire DHCP handshake.

These steps ensure that nodes/BMCs/etc. unknown to SMD will get temporary IP addresses, but will constantly be requesting new ones in the case that they become known to SMD. This also allows BMCs to be discovered dynamically so that they can be easily added to SMD.

For admins that care about which IPs are assigned to which MACs (e.g. for BMCs), CoreDHCP's "file" plugin is useful for this.

There is an example configuration that includes how to use these plugins in a CoreDHCP minimal working config file.