Closed gregular closed 10 months ago
@gregular Thanks for opening! We are looking at this issue.
Not sure if this is helpful but on a system on the same network that already has a hostname set (from a prior boot) if I run this via sheltie:
bash-5.1# netdog generate-hostname
Reverse DNS lookup failed: failed to lookup address information: Temporary failure in name resolution
"fe80::5c:a1ff:feab:1e00"
I have a valid ipv4 on the main interface but I'm getting the ipv6 link local from netdog here too.
Thanks for the issue report @gregular !
The hostname is generated on first boot by (as you've found) the setting generator netdog generate-hostname
. The way this should work is as follows:
/var/lib/netdog/current_ip
. This IP gets written to file earlier in boot.
wicked
as the networking backend, we query the lease. systemd-networkd
, we query networkctl
for this information for the primary interface. From that information, we use the first IPv4 address we find, otherwise we return the first IPv6 address found.I'm wondering if the link doesn't have a DHCP-vended address at the time we query it, which is why we end up with the link-local address.
You mentioned you're using metal-dev
. Can you share the net.toml
you're using and how you're running the image (qemu/metal, etc)? Depending on how that is set up, the system may not be waiting for the link to get an address before moving on.
The networking scenario I am using doesn't have a net.toml
and is just using the default eth0 interface as defined on the kernel command line. However, what I speculate I'm running into here is a scenario where eth0 doesn't "plug" for an extended period of time (an example would be a USB device that is plugged in later) and so systemd-networkd
actually times out and the rest of the system attempts to come up. In that scenario it looks like /var/lib/netdog/current_ip
is getting an IPv6 link-local address.
So perhaps this is an issue specific to me. I'm still curious if an interface never acquires an IPv4 address at all (say I have an IPv6-only network segment) shouldn't bottlerocket still handle it? Link-local or valid IPv6 address it still looks like the regex for ValidLinuxHostname doesn't allow IPv6 addresses. The hostname generation algo should be able to fallback-generate something for an IPv6 IP like it does with an IPv4.
OK I think I have chased this down to a timeout issue as stated before. The easiest way to reproduce this is to allow a link to come up but turn off dhcpd on the network until systemd-networkd
times out (I think the other way would be to let dhcp6 complete but dhcp4 fail). In that case netdog will grab the ipv6 LL addr and drop it in /var/lib/netdog/current_ip
and then the system won't boot past hostname generation even though dhcp might succeed later.
I am going to workaround this issue with this patch:
diff --git a/sources/api/netdog/src/cli/generate_hostname.rs b/sources/api/netdog/src/cli/generate_hostname.rs
index ddbd8f6c..91a0d77f 100644
--- a/sources/api/netdog/src/cli/generate_hostname.rs
+++ b/sources/api/netdog/src/cli/generate_hostname.rs
@@ -58,7 +58,7 @@ pub(crate) async fn run() -> Result<()> {
hostname
}
// If no hostname has been determined we return the IP address of the host.
- .unwrap_or(ip_string);
+ .unwrap_or(ip_string.replace(".","-").replace("::","-").replace(":","-"));
// sundog expects JSON-serialized output
print_json(hostname)
This seems like a bug in netdog to me. As a nice side effect my hostnames now go from all 192
to something better like 192-168-100-42
.
@gregular I agree with you - and the patch seems reasonable, though I might argue leaving the dots "." rather than replacing them with dashes "-". Replacing the colons ":" is the right thing to do however.
Would you be interested in contributing this fix? If not, I'm happy to integrate something similar.
Sure I'll spin up a pull request with a test case in the next bit and see how things go. The reason that I like replacing the "." in the IP address is based on another change earlier that went in that truncates the hostname from the full IP to just the prefix if not resolve-able. So as mentioned I have a bunch of machines in the network that all come up with the hostname as 192
because my IPv4 prefix is 192
.
Image I'm using: I'm building a custom 1.15.1 metal-dev variant build with additional drivers turned on but I speculate this is a core issue and not specific to my build. I've seen it with previous version builds as well but believe it has only ever happened on networkd based builds not wicked.
What I expected to happen: When booting up I'm expecting to generate initial settings correctly, specifically for the hostname.
What actually happened: sundog[1606]: Error deserializing hashMap to Settings: Error deserializing scaler value: Unable to deserialize into ValidLinuxHostname: 'fe80::dea6:32ff:fea9:513c' must only be [0-9a-z.-], and 1-253 chars long
How to reproduce the problem: It doesn't happen on every initial build/boot but fairly regularly I can't generate initial settings on a clean build. Just boot and get this error and forward progress halts and my config settings seem to be corrupted from then on and I can't boot. Occasionally I can reflash/purge the config directory and beat the race condition and it gives me a hostname based on a valid IPv4 address. I don't currently have IPv6 turned on for this lan segment so I've never seen it fail with a non link-local IPv6.