Open rainest opened 2 weeks ago
I'm not sure how much we considered the needs of the DHCP server for the original dnsmasq Deployment. It was running, but IDK if we ever had a proof of concept for anything talking to it.
The DHCP server will handle requests for hosts outside the Kubernetes network. Normal broadcast delivery will not work as such, and we'd need to forward traffic to it.
This is apparently how CSM handles DHCP also--it has a Kea DHCP instance exposing a LoadBalancer, with metallb BGP peering to node networks and forwarding rules to the LB address on the node network (see CSM's PXE troubleshooting and DHCP troubleshooting guides).
I'm unsure where all that gets configured for full CSM setups, but found a basic example of minimal configuration for such a setup.
I don't think there's any way to handle dynamic population of the server_id
or router
config or that dynamic handling would even be desirable. AFAIK these will need to be config parameters that we just trust you've set to the correct value. The V4 ID needs to match the spec.loadBalancerIP
. I don't think any other in-cluster config cares about the V6 ID.
We oddly had an existing tftpd
key, but weren't using it in any of the templates. It was added alongside dnsmasq and the dnsmasq built-in TFTP server configuration.
https://github.com/OpenCHAMI/deployment-recipes/pull/87 provides a basic "it runs!" set of values and templates, with some caveats:
/coredhcp
path in the image is a directory with a README.md; it's apparently supposed to get replaced with a binary built from some templated Go. I hulk smashed the (working) release binary from the repo into a local image build instead.level=debug msg="EthernetInterfaces: map[]" prefix="plugins/coresmd"
for my empty SMD instance with no errors).The coresmd image is currently busted after some possibly incomplete file rearrangement upstream. The /coredhcp path in the image is a directory with a README.md; it's apparently supposed to get replaced with a binary built from some templated Go. I hulk smashed the (working) release binary from the repo into a local image build instead.
Does the latest version v0.0.5 work for you? I examined it and the /coredhcp
is a binary in this version.
SMD in the chart does not appear to serve TLS. I stuffed a fake cert into the SMD plugin config. The plugin appears to have connected over HTTP fine (it logged level=debug msg="EthernetInterfaces: map[]" prefix="plugins/coresmd" for my empty SMD instance with no errors).
At this point, we have not enabled TLS at the SMD level and rely on the API gateway for TLS termination and signed tokens for authN/authZ. Having said that, we have the ACME pieces running and we could create and rotate TLS certificates for the microservices using that or we could protect them using an mTLS service mesh. This matters more for k8S deployments than it does in our podman deployments
Do you have a proposal for mTLS within k8s for SMD that doesn't preclude the current operations?
I don't think there's any way to handle dynamic population of the server_id or router config or that dynamic handling would even be desirable. AFAIK these will need to be config parameters that we just trust you've set to the correct value. The V4 ID needs to match the spec.loadBalancerIP. I don't think any other in-cluster config cares about the V6 ID.
You're driving at the right stuff here. We may need to explore options outside of the standard k8s networking in order to get this to work reliably. I've never understood how networking would work to bring DHCP properly into a remote k8s cluster without complex and unpleasant VLAN incantations. The solution in CSM only works because of direct connections to the worker nodes and plenty of VLAN tagging.
The coresmd image is currently busted after some possibly incomplete file rearrangement upstream. The /coredhcp path in the image is a directory with a README.md; it's apparently supposed to get replaced with a binary built from some templated Go. I hulk smashed the (working) release binary from the repo into a local image build instead.
Does the latest version v0.0.5 work for you? I examined it and the
/coredhcp
is a binary in this version.
@rainest Ah, I found the issue. We were originally pushing coresmd
as the container name and then started pushing coredhcp
. This led to the former one not working while the latter did. We have deleted the coresmd
container to eliminate confusion. Going forward, we should use ghcr.io/openchami/coredhcp
as the CoreDHCP container that has the coresmd plugins built-in.
Thanks for reporting the issue!
I will update the quickstart docker-compose recipe to use the correct container.
The above PR also fixes an issue where 'permission denied' would be seen when binding to port 67. Fixed in coresmd v0.0.6.
Short Description Remove the existing dnsmasq Deployment from the chart and replace it with CoreDHCP, for https://github.com/OpenCHAMI/roadmap/issues/50
Definition of Done
Additional context Ref https://github.com/OpenCHAMI/deployment-recipes/pull/78 and https://github.com/OpenCHAMI/deployment-recipes/pull/84 for equivalent work on the Podman side.