coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
146 stars 30 forks source link

DHCP IP changes on every boot when using PXE #1432

Closed jgunthorpe closed 8 years ago

jgunthorpe commented 8 years ago

Using coreos 1010.5.0

PXE booting via iPXE and observing that the DHCP IP changes on every boot. This is undesired.

I tracked this down to the DHCP client identifier changing on every boot. This is because networkd now defaults to ClientIdentifier=duid (see http://man7.org/linux/man-pages/man5/systemd.network.5.html)

This might make sense when booting from disk - but when PXE booting is detected CoreOS should change that parameter to ClientIdentifer=mac before starting the network. This will follow the PXE RFC for IPv4 client id generation. Not sure if networkd can do it, but for DHCPv6 the ClientIdentifier should be a type 3 DUID-LL for PXE.

This is similar to #360, but the solution 'use ignition' seems unworkable since the only way to get the ignition settings into a PXE environment is after the network has been started, which is too late to change the dhcp settings.

For reference, here is a DHCP trace of what is happening now:

System boot rom fetching iPXE:

    IP: 0.0.0.0 (a0:48:1c:a1:f7:fe) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 1 (BOOTPREQUEST)
OPTION:  97 ( 17) UUID/GUID                 00805439cf14bae3 ..T9....
                        1197a8a0481ca1f7 ....H...
                        fe               .
OPTION:  94 (  3) Client NDI                010310           ...
OPTION:  93 (  2) Client System             0007             ..
OPTION:  60 ( 32) Vendor class identifier   PXEClient:Arch:00007:UNDI:003016

iPXE fetching CoreOS:

    IP: 0.0.0.0 (a0:48:1c:a1:f7:fe) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 1 (BOOTPREQUEST)
OPTION: 175 ( 27) ???                       b105018086153aeb ......:.
                        0301000017010124 .......$
                        0101130101270101 .....'..
                        150101           ...
OPTION:  61 (  7) Client-identifier         01:a0:48:1c:a1:f7:fe
OPTION:  97 ( 17) UUID/GUID                 00805439cf14bae3 ..T9....
                        1197a8a0481ca1f7 ....H...
                        fe               .

CoreOS Booted:

    IP: 0.0.0.0 (a0:48:1c:a1:f7:fe) > 255.255.255.255 (ff:ff:ff:ff:ff:ff)
    OP: 1 (BOOTPREQUEST)
OPTION:  53 (  1) DHCP message type         1 (DHCPDISCOVER)
OPTION:  61 ( 19) Client-identifier         ff:b6:22:0f:eb:00:02:00:00:ab:11:e7:91:1f:ab:08:35:0f:50
mischief commented 8 years ago

looks like your client mac address is consistent - why don't you use that instead?

jgunthorpe commented 8 years ago

How do you mean?

A new version of ISC dhcp (that I am not running) has an option to totally ignore the client-id, but it is not really a good idea.

This is a core os bug because the various RFCs specify the client-id should be a MAC when PXE booting, and frown on randomizing the DUID on every boot.

mischief commented 8 years ago

when you pxe boot, you could set systemd.machine_id= on the kernel commandline. that should give you a stable client identifier, but will require you pre-generate a unique machine id for each pxe host (rather than have it be generated on-system).

jgunthorpe commented 8 years ago

I could do that, however that means every PXE boot still consumes two IP addresses from the pool (sub optimal if I have a lot of machines) and it doesn't really help make PXE booting a fully functional option in CoreOS.

Better would be to add a coreos.pxe=1 kernel command line that causes the ClientIdentifier to be switched to MAC as I described. Then all PXE users will have a solid fix.

Why the reluctance to see this as a coreos bug? This is probably a regression as things would have worked fine for PXE before networkd changed the default to duid.

crawford commented 8 years ago

It should be easy enough to put together a generator that adjusts the network config depending on the OEM ID ("pxe" in this case). We'll probably need it in the future for other providers cough DigitialOcean cough.

jgunthorpe commented 8 years ago

Just to clarify, with the two patches dm0 just made, should coreos.oem=pxe be set on the kernel command line when PXE booting?

Does https://coreos.com/os/docs/latest/booting-with-pxe.html need an update too?

jgunthorpe commented 8 years ago

.. and for others stumbling across this, the systemd.machine_id kernel parameter isn't supported until systemd v229, which is newer than what coreos stable has today.

crawford commented 8 years ago

Just to clarify, with the two patches dm0 just made, should coreos.oem=pxe be set on the kernel command line when PXE booting?

Given the current implementation, yes they would have to be set. This is not desirable though, so we will be sure to fix it up so that it's not necessary.

dm0- commented 8 years ago

I've updated the pull requests to follow Ignition's behavior of using PXE when no OEM is given on the kernel command-line.

crawford commented 8 years ago

LGTM

jgunthorpe commented 8 years ago

@dm0- this causes a behaviour change in my non-PXE bare metal machines, they don't have a coreos.oem.id kernel command line parameter, so they change away from machine id mode to mac mode..

It would be nice if this stopped changing. Honestly, I'd just permanently set it to mac for everything. I think the target environment for the DUID mode is something like a laptop with docking stations or other variable hardware that just doesn't seem to be the focus for coreos.

crawford commented 8 years ago

@jgunthorpe sorry about that. We are going to change it one more time (back to the original behavior). Instead of checking the OEM ID, it's more accurate to check if the root kernel parameter is specified. In your case, it is, so the MAC address will not be used as the DHCP client identifier.

jgunthorpe commented 8 years ago

Thanks, I still think you should seriously consider not using DUID mode at all for CoreOS.

crawford commented 8 years ago

This has been cleaned up in the latest PRs.

redbaron commented 6 years ago

Stumbled upon this and it is not clear for me what is a recommended way to make it work in conjunction with matchbox.

Current situation in stable 1520.8.0 is following:

yy-pxe.network:

...
[DHCP]
ClientIdentifier=mac
UseMTU=true
UseDomains=true

zz-default.network:

...
[DHCP]
UseMTU=true
UseDomains=true

What happens in my case, PXE starts, downloads Ignition config from matchbox and calls coreos-install. If Igntion config is templated with values of the {net0/ip:ipv4}, then it become invalid on a subsequent boot :( If Container Linux continues to use DUID, then this quirk at least should be documented in a PXE booting docs.

bgilbert commented 6 years ago

@redbaron It sounds as though you're having a slightly different issue: one client ID is used by the PXE-booted system that installs Container Linux, and a different one is used by the installed system after rebooting (since the latter won't use the PXE config). If so, could you open a new bug?