NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.19k stars 14.19k forks source link

Requiring `hostname` to be a single domain label is fairly heavy handed for some networks #94011

Open grahamc opened 4 years ago

grahamc commented 4 years ago

PR #76542 changed the hostname to be a short domain in https://github.com/NixOS/nixpkgs/commit/993baa587c4b82e791686f6ce711bcd4ee8ef933 and enforce this by validation. This has broken large corporate users whose networks by convention use the FQDN as their hostname, and who also have decades of history and infrastructure built around this.

I think this is a case where "one size fits all" doesn't work so well, and I'm not sure this particular point is something we want to risk breaking / losing users over.

The PR references man 5 hostname, which says:

The hostname may be a free-form string up to 64 characters in length; however, it is recommended that it consists only of 7-bit ASCII lower-case characters and no spaces or dots, and limits itself to the format allowed for DNS domain name labels, even though this is not a strict requirement.

This point about not being a strict requirement, I think, should not be made in to a strict requirement at our level.

grahamc commented 4 years ago

cc @primeos, @flokli, @zimbatm, @vcunat

andir commented 4 years ago

Just allow users to set just the hostname as FQDN.

I think hostname vs fqdn is a relic that should not be used most of the time. Especially with the combination of search names (which usually provide for nice privacy leaks…). I for my part just pretend there is only the hostname and that is always a FQDN. There is no domain name a server belongs to. It might be the wrong thing to do on paper but in reality I do not care what a server thinks it's name is (except for things like SMTP handshakes where another party wants an in-band confirmation).

grahamc commented 4 years ago

I've reverted the relevant commit in https://github.com/NixOS/nixpkgs/pull/94022 -- take a look?

primeos commented 4 years ago

This has broken large corporate users whose networks by convention use the FQDN as their hostname, and who also have decades of history and infrastructure built around this.

This is of course a problem that we'd like to avoid (as with any breaking changes) but tbh I don't really understand that argument. Couldn't they just easily revert the relevant commit in their fork?

This point about not being a strict requirement, I think, should not be made in to a strict requirement at our level.

Anyway, that's certainly a valid argument. And in #76542 it was only ever made into a strict requirement since NixOS also provides networking.domain which makes our case a bit different. But since only networking.hostName affects the kernel's node name I'm ok if we wouldn't want to enforce it for that reason (and to allow additional characters).

But we also need to consider that allowing dots in networking.hostName makes some NixOS implementations and checks more difficult and can lead to non-obvious configurations issues that can be hard to find (I think there was a comment about Postfix in the PR but I couldn't find it anymore).

primeos commented 4 years ago

This has broken large corporate users whose networks by convention use the FQDN as their hostname, and who also have decades of history and infrastructure built around this.

@grahamc just to better understand this (if you have time): What's the main problem here? Is this about the Linux kernel hostname or networking.hostName (and in that case why is reverting the commit in a fork or updating the code not an option).

primeos commented 4 years ago

@grahamc @arianvp and anyone else who wants relaxed hostname checks: I get that you are busy (as we all are) but we really need quicker and more active responses if we want to resolve this discussion before the 20.09 release.

IIRC we still don't know any technical problems apart from the comments that this might be inconvenient for existing users. Before the final release we should also look at https://github.com/NixOS/nixpkgs/pull/94022#issuecomment-674385613 (NixOps), check/finalize the release-notes, and determine if we want a read-only fqdn option.

grahamc commented 4 years ago

Sorry, I had a baby and fell off the internet a while. I can’t get back to this soon.

On Sat, Sep 5, 2020, at 8:19 AM, Michael Weiss wrote:

@grahamc https://github.com/grahamc @arianvp https://github.com/arianvp and anyone else who wants relaxed hostname checks: I get that you are busy (as we all are) but we really need quicker and more active responses if we want to resolve this discussion before the 20.09 release.

IIRC we still don't know any technical problems apart from the comments that this might be inconvenient for existing users. Before the final release we should also look at #94022 (comment) https://github.com/NixOS/nixpkgs/pull/94022#issuecomment-674385613 (NixOps), check/finalize the release-notes, and determine if we want a read-only fqdn option.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NixOS/nixpkgs/issues/94011#issuecomment-687602874, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASXLBJX4XUOSACD7LIDIDSEIUDZANCNFSM4PI4KHNA.

primeos commented 4 years ago

@grahamc congratulations then ;) I'm happy for you :)

Regarding this issue: From the comments here and especially in #94022 it seems to me like the current "trend" is to keep the strict checks and don't change anything (IIRC), though we didn't really reach any consensus yet. So my idea would be to simply leave this issue open for further comments and see if we get any feedback/complaints regarding this during the beta release cycle.

martinetd commented 4 years ago

Just to add a data point as a nobody user, I'm also one of these weird users who use the fqdn as their hostnames, and got "surprised" when installing a new system as 20.09pre to test -- upgrades will also all require adjustments. I don't have a lot that depends on that but it's more than just adjusting hostnames, and I don't have much, so I can relate to whatever org stumbled into this with whatever history they have. For example, I have attrsets with hostnames and bag of datas for wireguard autosetup and things like that which will need amending. It could quickly be messy at larger scales.

OTOH, I understand "full hostnames" can cause problems, and the error is clear enough, but if I want to shoot myself in the foot I don't see what's wrong with that? :)

Well, either way 20.09 is out soon -- I'll wait this long to decide if I want to update my scripts or not :D Keep up the good work everyone and congratulations @grahamc!

EDIT: after reading the comments in #94022 I can understand it's difficult -- places with explicit checks in nixpkgs are annoying for everyone. Well. Happens what will happen, but a step through with a warning as suggested there would probably be appreciable for a few people.

jonringer commented 4 years ago

Just as a reminder, the 20.09 release is scheduled to happen this monday, the 28th.

If this is still relevant to blocking the release, then there should be some forward movement.

A blocker meeting has still yet to be scheduled. But, if you consider this item to still warrant blocking the entirety of the nixos-20.09 release, then please post on the Feature freeze discussion issue. A template for proposing an item can be found https://github.com/NixOS/nixpkgs/issues/95475#issuecomment-699218336

0x4A6F commented 4 years ago

man 7 hostname states:

Each element of the hostname must be from 1 to 63 characters long and the entire hostname, including the dots, can be at most 253 characters long.  Valid  characters  for
hostnames are ASCII(7) letters from a to z, the digits from 0 to 9, and the hyphen (-).  A hostname may not start with a hyphen.

And references some rather old RFCs:

RFC1123:

   2.1  Host Names and Numbers

      The syntax of a legal Internet host name was specified in RFC-952
      [DNS:4].  One aspect of host name syntax is hereby changed: the
      restriction on the first character is relaxed to allow either a
      letter or a digit.  Host software MUST support this more liberal
      syntax.

      Host software MUST handle host names of up to 63 characters and
      SHOULD handle host names of up to 255 characters.

The current implementation violates this:

"^$|^[[:alpha:]]([[:alnum:]_-]{0,61}[[:alnum:]])?$";

Are there reasons for this implementation?

vcunat commented 4 years ago

I'd rather restrict this particular thread just to the question whether it should/can contain dots. What exact characters to allow... doesn't seem to be a real problem right now.

primeos commented 4 years ago

The current implementation violates this:

Yes, this is known and the main reason why this issue exists. Though man 5 hostname (form systemd) is a better reference as Linux only supports up to 64 characters for the entire hostname (including the terminating newline).

Are there reasons for this implementation?

The main discussion was in #76542 (but also #94022 and this issue).

I'd rather restrict this particular thread just to the question whether it should/can contain dots.

Yeah, I completely agree. The Linux kernel network node hostname can contain dots and this issue is about whether we want to allow this using networking.hostName or not. The reason why it currently isn't allowed is because we have networking.domain for this (and because it isn't recommended to use a FQDN, etc.).

Personally I feel like a grace period with a warning might've been a safer choice but this also comes it's own downsides.

Anyway, basically this issue lacks feedback (e.g. from beta testes) for why this is a real problem (i.e. not I used a FQDN for networking.hostName and now this doesn't work anymore / sucks; instead we're interested why the combination of networking.hostName and networking.domain doesn't work as a replacement [e.g. sysctl kernel.hostname should still be overridable via kernel.sysctl."kernel.hostname"]).

0x4A6F commented 4 years ago

Sorry, the length of hostname is limited to 64, but that is not my point. This implementation introduces too strict type requirements, if dots are disallowed.

Specifically the limitation to alphabetical characters at the start, which must be relaxed as stated in RFC 1123. RFC 1123 updates RFC 952 and was published as Internet Standard exactly 31 years ago. Limiting the start of hostname to alphabetic character is stated in man 5 hosts, but it is utterly outdated and not a reference on this topic (no meaningful changes as far as 2004-11-03, only referencing RFC 952).

grahamc commented 4 years ago

My inability to provide details is mostly due to time (baby) and client confidentiality. There are Perl libraries with bug reports a decade old having to do with not handling the correct approach properly.

On Thu, Oct 1, 2020, at 5:23 PM, 0x4A6F wrote:

Sorry, the length of hostname is limited to 64, but that is not my point. This implementation introduces too strict type requirements, if dots are disallowed.

Specifically the limitation to alphabetical characters at the start, which must be relaxed as stated in RFC 1123. RFC 1123 updates RFC 952 and was published as Internet Standard exactly 31 https://datatracker.ietf.org/doc/rfc1123/history/ years ago. Limiting the start of hostname to alphabetic character is stated in man 5 hosts, but it is utterly outdated and not a reference on this topic (no meaningful changes as far as 2004-11-03, only referencing RFC 952).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NixOS/nixpkgs/issues/94011#issuecomment-702405869, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAASXLABEPL76M47TZWR3JLSITXNZANCNFSM4PI4KHNA.

primeos commented 4 years ago

This implementation introduces too strict type requirements, if dots are disallowed.

Again, this is known and not ideal but it was accepted as a compromise for its advantages. AFAIK it would be way more relevant to know why this is a problem and which effect of networking.hostName (which is only an abstraction) causes this problem (and if this breaks anything that it shouldn't).

But I also want to point out that I'm only trying to moderate this issue (though I'll try to reduce my participation here as we don't seem to make much progress / reach any consensus). Also I'm basically fine with any outcome (but a bit biased as #94022 was IIRC mostly rejected). Would it maybe help to do another vote here (e.g. keep the strict requirement, only make it a warning, or relax the requirement)?

My inability to provide details is mostly due to time (baby) and client confidentiality.

Yeah, that's unfortunate (but obviously not your fault).

There are Perl libraries with bug reports a decade old having to do with not handling the correct approach properly.

Not sure what this means. Do they need the FQDN and cannot get it if the Linux kernel hostname doesn't contain the domain (in which case kernel.sysctl."kernel.hostname" might be a good workaround)?

flokli commented 4 years ago

@grahamc do you think setting kernel.sysctl."kernel.hostname", or setting a transient hostname via hostnamectl would be suffient to work this around?

I'd assume this mostly breaks "enterprise tooling" outside the NixOS ecosystem reading the hostname directly, not from the module system.

grahamc commented 4 years ago

This is a great question, let me get that tested.

grahamc commented 4 years ago

Okay, I've confirmed this works and fixes the concerns from the Kerberos / perl side:

  boot.kernel.sysctl."kernel.hostname" = "${config.networking.hostName}.${config.networking.domain}";

I wonder if this snippet should either be in the release notes, or a networking.hostnameIncludesDomain option?

flokli commented 4 years ago

Let's add this to the release notes. Setting the sysctl is already exposed as an option - hostnameIncludesDomain could be misunderstood...

jonringer commented 4 years ago

Seems like there's three action items:

If this seems acceptable, then I think we can remove this as a blocker

primeos commented 4 years ago

@jonringer I just drafted #100151. Could you take a look?

primeos commented 4 years ago

Maybe #100155 will also be helpful for some (but it's only indirectly related to this PR in that it helps to obtain the FQDN via a read-only NixOS option).

flokli commented 4 years ago

I feel like the initial issue has been addressed sufficiently, there's workarounds that were found, documented and added to the release notes.

There's some ongoing discussion on https://github.com/NixOS/nixpkgs/pull/100155, but that's about adding a new convenience option, which is only loosely related to this issue, and certainly not blocking 20.09.

Let's close this one.

grahamc commented 4 years ago

Another case where this has bitten me is provisioning machines in Packet where we only get "hostname" from the Packet API, and customers can only specify "hostname". However, the API-provided hostname will often include dots without intending to actually specify the domain. This is particularly true in the case of a default name. This means I can't do any "best" thing and have to manipulate the user input and potentially set the hostname to something they did not ask for.

arianvp commented 4 years ago

I already brought up the Packet issue before (I dont know where though; maybe it was during the go no-go meeting). Because packet was behind our release process anyway we decided to not make that a blocker if I recall correctly. (Though of course that's a bit of chicken egg ; given you are the one maintaining those images and I suppose this issue is blocking you from creating newer ones :P )

grahamc commented 4 years ago

Thanks. I sorted it by just replacing .s with -s, but since Packet's validation may be more or less strict than our validation, it is essentially unsafe for me to use the user-provided hostname in the system configuration.

flokli commented 3 years ago

Also note NixOS/systemd properly picks up hostnames (including dots in the hostname) if networking.hostName is set to an empty string.

This should be accomplishable by setting systemd.hostname= in the kernel cmdline or by receiving a hostname from DHCP (if networkd is enable, due to UseHostname= defaulting to true).

This also should work with packet nodes.

nixos-discourse commented 2 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/hostname-is-not-of-type-string-matching-the-pattern/17666/1

arianvp commented 11 months ago

Note that flakes still is broken out of the box on most hosting providers due to this. As most hosting providers push a transient hostname that is an FQDN and then your first nixos-rebuild switch --flake will just break.

I completely disagree with the take that . in hostnames is bad. I think the opposite is true. NIS Domain names are bad and should not be used. e.g. MacOS doesnt even support setdomainname() anymore.

I would really prefer this Regex to be removed and allow dots back in networking.hostName. It's a hill to die on that is extremely annoying for day to day users.

arianvp commented 11 months ago

Also note that boot.kernel.sysctl."kernel.hostname" is not a workaround at all. As then it means you'll have a transient hostname. Which will be overriden immediately by DHCP once the network is up.

NixOS should allow setting a static hostname with FQDN.

arianvp commented 11 months ago

One more datapoint. Given we're a systemd-based distro and hostnamed is reponsible for handling transient (and static) hostnames for users using Networkd, DHCPCD, or NetworkManager. Systemd has the following to say and I think we should use it as our authoritative source:

The static and transient hostnames must each be either a single DNS label (a string composed of 7-bit ASCII lower-case characters and no spaces or dots, limited to the format allowed for DNS domain name labels), or a sequence of such labels separated by single dots that forms a valid DNS FQDN.

arianvp commented 11 months ago

Finally. The suggestion of just setting both hostName and domain and relying on networking.fqdn doesn't work either:

networking.domain sets the NIS domain through setdomainname() and the NIS domain is transient only. So it can change any time due to DHCP.

So if you have a DHCP server that pushes a NIS Domain name; it will change underneath your feet and your networking.fqdn will not be the same anymore as your real fqdn leading to really confusing bugs.

The docs of networking.domain are also a bit misleading in this regard. as DHCP will override the domainname regardless of whether the option is set:

    The domain.  It can be left empty if it is auto-detected through DHCP.

For example on EC2 you can have the scenario:

networking.hostName = "hello";
networking.domain = "my-domain.com";

Then fqdn evals to hello.my-domain.com

You'd expect hostname -f to return hello.my-domain.com but it actually returns hello.my.configured.domain.in.dhcp.option-set.vpc (If one configures EC2's DHCP server to broadcast the domain name over DHCP)

zimbatm commented 11 months ago

Good points, we should follow systemd's lead here.

mossholderm commented 10 months ago

Another point... Kerberos really wants the hostname to be an FQDN. It is baked into the entire authentication model, down to the system level. If you want to support enterprises, you'll need to allow Kerberos to function correctly.

zimbatm commented 10 months ago

Summoning @flokli

flokli commented 10 months ago

I agree with @arianvp 's assessment and the references provided. Unfortunately there's a lot of stuff in the nixos module system using the fqdn, and I'm not sure it'd all work, so a PR changing this would need to trace these usages.

zimbatm commented 10 months ago

Does any volunteer want to drive this?

arianvp commented 6 months ago

Luckily fqdn seems to be used in only a few modules. I think it's doable to fix them.