Open aral opened 2 years ago
Thanks for filing this. Note that we'll need these answers from the OS perspective, not the user perspective. E.g., how does the OS fetch userdata? How does it learn its own hostname?
Some previous discussion in:
Apparently the platform is sometimes called hcloud
.
We discussed this in today's community meeting:
13:10:54 < jlebon> #agreed we would like to add support for Hetzner. we are looking
for volunteers to pick it up and push it forward.
Happy to hear it. Would you like me to ask around to see if I can find some contacts there or do you already have folks you can talk to?
Yes, that would be helpful, thanks!
Quick update: I’ve been in touch with an engineer at Hetzner:
“I passed it to the responsible people but they are on vacation for the next two weeks … I will force it to be answered then :)”
At least for reseting passwords a "QEMU Guest Agent" is running on the OS.
@der-On That might be the standard QEMU one, discussed in #74. We generally avoid shipping third-party agents (and reimplement pieces of them when necessary to avoid it), and don't currently ship the QEMU one.
So, how could we help to get CoreOS running on Hetzner Cloud?
@asciiprod Thanks for joining in! We could use some help answering the questions at the top of this issue. We have some answers already in the old Container Linux PRs, but it'd be good to make sure our understanding is up to date.
Sure, I'll try to answer them as good as I can:
What is the official name of the platform? Is there a short name that's commonly used in client API implementations?
That's already a tricky one. It is the Cloud product of the company Hetzner Online GmbH. It is usually just called Hetzner. That is also the name used for the cloud-init datasource. Other implementations like terraform or ansible use hcloud
. Since ignition is more like cloud-init, I guess it should be hetzner
.
How can the OS retrieve instance userdata? What happens if no userdata is provided?
We provide a meta/userdata endpoint at http://169.254.169.254/hetzner/v1
. Userdata is optional, metadata is not. So if the endpoint would be absent cloud-init would run to a certain degree using DMI information, but fail to retrieve essential data like SSH-keys.
Does the platform provide a way to configure SSH keys for the instance? How can the OS retrieve them? What happens if none are provided? Yes, via metadata endpoint. If no SSH key is selected at instance creation, a password hash is provided. If neither is retrieved/configured, the instance has no fallback login password.
How can the OS retrieve network configuration? Is DHCP sufficient, or is there some other network-accessible metadata service? IPv4 configuration is provided via DHCP, IPv6 currently via metadata service only.
In particular, how can the OS retrieve the system hostname? via metadata service.
Is there a mechanism for the OS to report to the platform that it has successfully booted? Is the mechanism required? No, there isn't one and so not required.
Does the platform have an agent that runs inside the instance? Is it required? What does it do? What language is it implemented in, and where is the source code repository? There is the qemu guest agent, which is used to reset passwords. There is also a package called hc-utils (https://github.com/hetznercloud/hc-utils). This contains bash scripts/systemd services/udev rules to automatically mount additional blockstorage volumes or start a DHCP client for new or unconfigured network interfaces. These are purely for ease of use and not required.
How are VM images uploaded to the platform and published to other users? Is there an API? What disk image format is expected? Currently there is no direct upload option. Standard images are provided and updated by Hetzner on a regular basis. For customers to deploy their own images, there are two options: ISO or via rescue system. The latter is a Debian-based live Linux and allows to write anything to the virtual disk. Using snapshots this image can be used by a customer to create new instances. If CoreOS would work out of the box, we could add it to the list of standard images as we already have Fedora there.
Are there any other platform quirks we should know about? Intel-based instances are currently using i440fx and AMD-based are Q35. Both legacy. UEFI possible, but not exposed via API (yet). No secure boot.
@asciiprod thanks for the detailed feedback! Some additional thoughts from my side:
aarch64
instances too? Do they work the same way?The canonical documentation is available at: https://docs.hetzner.cloud
If we want to include the image on the platform, it must support passing a password hash as instance creation does not force selecting an SSH key.
Using UEFI-only for given image is something that is currently not implemented. I'd have to check internally if we could do that.
The internal workflow for images does not import any external disk images. Hetzner Cloud images are generated by automated installations (e.g. kickstart/subiquity) from distribution ISOs using packer & ansible. This leads to a compressed (zstd) raw disk image, which is uploaded as an image snapshot and used to test and validate the new build on the platform. That is the point were it could be possible to import a pre-build external disk image. However that would have to be discussed internally, if it is acceptable to open this process up for 3rd party generated images.
From a release and support point of view, I think we could only support the stable version.
Currently no aarch64 Cloud instances, but as we offer Ampere dedicated servers, that's something I would keep on the list and I'd say they work the same way (probably UEFI-only)
Ah great, thanks. The page I was looking for is https://docs.hetzner.cloud/#server-metadata (though it doesn't currently cover the userdata part).
Thanks for the detailed info, this is very helpful!
If we want to include the image on the platform, it must support passing a password hash as instance creation does not force selecting an SSH key.
I don't think we should support this. Fedora CoreOS tries to encourage the use of best practices, and passwords aren't that. On other platforms, Fedora CoreOS instances are usually configured with an SSH key passed in the Ignition config.
From a release and support point of view, I think we could only support the stable version.
We always recommend that users run some testing
and next
instances alongside their stable
instances to help us catch regressions before they're promoted to a stable
release. Thus, those streams are an important part of any Fedora CoreOS deployment strategy. It's entirely reasonable for Hetzner not to provide customer support for those streams, but it's important that they be available alongside stable
. If that isn't possible, I think we shouldn't pursue adding stable
either, and either only document the custom deployment flow or not document Hetzner Cloud at all.
I totally agree that SSH keys should be used and we also strongly recommend it during instance creation. But we do offer a password fallback for the existing OS images. So if CoreOS does not support it, we would need to enforce it.
In any case the more CoreOS specific changes we would need to make, the more difficult it becomes to adopt it for Hetzner Cloud.
I totally agree that SSH keys should be used and we also strongly recommend it during instance creation. But we do offer a password fallback for the existing OS images. So if CoreOS does not support it, we would need to enforce it.
In any case the more CoreOS specific changes we would need to make, the more difficult it becomes to adopt it for Hetzner Cloud.
IIUC from the docs, it seems like the password hash is injected into the user-data, which is assumed to be a cloud-init config. Is that correct? I derived this from the fact that there's no entry for it in the Server Metadata section. (Aside: it seems like that section is missing an entry for public-keys
, no?)
If that's the case, that logic would have to learn to support Ignition configs too. Password authentication is disabled by default on FCOS, so it would have to inject a drop-in for it. Also, the default sshd config (at least on Fedora) prohibits password authentication for the root user so it would have to undo that too.
What happens if no SSH keys are provided and the user-data isn't a cloud-init config? Does the API return an error because it doesn't know how to inject a root password? That seems like acceptable behaviour for the time being and avoids adding anything FCOS-specific.
The metadata API provides either the user-selected SSH-key or a random generated password hash if no SSH-key is selected. So instance creation will always succeed. I have to apologize for the incomplete docs. The metadata service has of course a field/path for the public-keys and network-config. Please correct me if I am wrong. As far as I understand it, we are currently only missing an afterburn provider to make CoreOS work on our platform. If that is correct, having it would enable us and anyone else to start using/testing it. And it would also allow to resolve the other questions (password support, UEFI-only, releases) separately and step by step.
Yes, if we want to start making incremental progresses on this then the next immediate things to sort out on FCOS side are:
Hey everyone (@asciiprod, @lucab), any progress on this?
It would be really amazing to be able to boot up a Fedora CoreOS instance on Hetzner in under a minute (that’s how fast the supported instances boot up; it’s a game-changer for Small Web use) :)
Hey folks, any updates on this? Would still love to see it happen. Has communication between Fedora and Hetzner stalled? If so, how do we get it going again? :)
@aral I think this thread has all of the needed information now, or at least most of it. There are some old Afterburn and Ignition PRs that'll need a rebase and an update based on the information here. I don't think anyone is currently working on that, but feel free to run with it if you'd like!
I took some time this weekend to look into making this reality, Note that I'm not very familiar with CoreOS, and it's ecosystem. I'm learning as I go. :)
Since I'm working on this in my free time, I'd like to know my time is well-invested. How could we make sure this actually ends up being implemented, and not just another series of PRs that then go stale for multiple years?
Additionally, I need some technical guidance:
I would love to get some feedback & make this happen! I live in UTC+2, so I might not respond during the US day.
Awesome! Welcome!
While the full checklist is at https://github.com/coreos/fedora-coreos-tracker/blob/main/.github/ISSUE_TEMPLATE/implementing-new-platform.md, I think we can get to something useful with a subset of the steps.
- What does actually need to happen to fully support Hetzner Cloud?
Let's focus on landing support in Ignition and Afterburn. Then you will be able to convert a QEMU image to an Hetzner one via a few guestfish commands.
- How do Ignition/Afterburn "know" what platform they're running on?
Ignition and Afterburn know which platform their are own via the ignition.platform.id=<platform>
argument on the kernel command line.
Thus once we have support in Ignition and Afterburn, you'll be able to replace ignition.platform.id=qemu
by ignition.platform.id=hetzner
in the bootloader config file from an existing image to get a working Hetzner image.
- How do Ignition and Afterburn relate to each other? It seems to me that they try to solve similar problems, and I could not find any documentation as to what the difference is between the two?
Afterburn is mainly here to enable booting images on clouds with zero configuration and have SSH keys automatically provisioned.
Ignition is able to fully configure the system to your needs but requires you to provide a configuration file.
I would love to get some feedback & make this happen! I live in UTC+2, so I might not respond during the US day.
For Ignitions, existing PRs should help, and you can also take inspiration from https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceHetzner.py.
Hi @travier thanks for answering my questions, now the pieces start falling into place.
Turns out the Ignition part is trivial, so I created a PR there as well.
I've created a PR with the "simplified" steps to add a new platform: https://github.com/coreos/fedora-coreos-tracker/pull/1562
Ignition PR: https://github.com/coreos/ignition/pull/1707 Afterburn PR: https://github.com/coreos/afterburn/pull/996
Folks interested for initial support for this platform in Fedora CoreOS should open an issue with the emerging platform template and follow the steps there. Thanks!
Any updates on this for 2024?
I can’t imagine how launching a CoreOS installation on Hetzner’s cloud in under a minute would be bad for either Fedora or Hetzner. (Not to mention that this would have the Small Web launch on CoreOS instead of Ubuntu as that’s really the only option I see at the moment otherwise for an affordable platform with instance creation measured in the seconds.)
Anyone know what’s blocking this and how we can try and route around it?
@aral There's a really good guide by @swick that explains how to install Fedora CoreOS on Hetzner servers. It's not as easy as the other operating systems provided by Hetzner, but it's a good enough workaround until they provide official support.
@nachtjasmin Thanks, Jasmin, that is a good guide indeed. Sadly, for my needs (we will eventually have thousands of servers), that isn’t good enough so I’ve decided to go with AlmaLinux on Hetzner instead. It doesn’t automatically update like CoreOS, sadly, which would have been my first choice, but eight years of security updates should give us enough time to either implement a major version update system or transition to a transactional OS later.
Since this doesn’t look like it’s going to be implemented and since I’m moving ahead with using a different OS, I’m closing this. Please feel free to reopen if anything changes.
@aral Why not just leave it open, since it’s not solved yet?
@thomasaull I’ll leave that decision to the Fedora CoreOS folks. They can reopen it if they decide to work on it. It’s been open for over two years, there’s no reason to keep it open longer in my view.
@aral Got it. Just out of curiosity: What exactly is the issue with the snapshot approach? Boot duration too long?
@thomasaull It’s too convoluted and specific to Hetzner. I don’t want to tie Domain so closely to one provider, even if Hetzner is the one we’re initially going to be supporting and to have a hacky workaround be the core way that servers are deployed for the Small Web.
Also, hopefully, we (Small Technology Foundation) won’t be the only ones running Domain instances – other organisations around the world will so it’s just not feasible to base such a system on a workaround.
(Boot duration isn’t the issue as Domain now uses prewarmed instances.)
In the future, once we have more resources, etc., we can maybe review the decision.
Hope that helps give some insight into my, admittedly rather unconventional, needs :)
@aral Thanks for the insights! I'll read up on Domain/Small Web
@thomasaull If you’re going to, the end-to-end encrypted Kitten chat (https://ar.al/2023/02/20/end-to-end-encrypted-kitten-chat/) and Streaming HTML (https://ar.al/2024/03/08/streaming-html/) posts/videos should give you a good idea of where everything is. It’s a new stack, specifically for a peer-to-peer web (Small Web) 💕
Since this doesn’t look like it’s going to be implemented
One of the biggest roadblocks currently is the UEFI requirement. We would love to have UEFI by default for everyone as well, but it would break the existing customers to roll this out for the current products (as new VMs with existing OS images might not boot in UEFI mode, if the image was created on a BIOS machine).
We also don't really want to offer a server image that doesn't boot in legacy BIOS mode (which then wouldn't boot on the older machine types).
With support in Afterburn and Ignition now in stable, it should be possible to convert a QEMU FCOS image using the script in https://github.com/coreos/fedora-coreos-docs/issues/651 to an Hetzner one and use it to setup FCOS on Hetzner.
Testing welcomed! If successful, we should document that in the docs.
While we do not yet provide ready made images for Hetzner, I've written documentation on how to setup Fedora CoreOS on Hetzner with what we have available right now: https://github.com/coreos/fedora-coreos-docs/pull/654
Testing and feedback welcomed!
While we do not yet provide ready made images for Hetzner
What's preventing this last piece ^^?
Looks like from the docs PR you are just changing the platform ID, is that it?
Looks like from the docs PR you are just changing the platform ID, is that it?
Yes, that's the only bit missing. If there are no objections then we could start building those and that would definitely make it easier to provision an instance.
We discussed this issue on FCOS community meeting today and agreed that we will start producing Hetzner images for Fedora CoreOS.
We discussed this issue on FCOS community meeting today and agreed that we will start producing Hetzner images for Fedora CoreOS.
Does this mean that FCOS will become a 1-click install option on Hetzner Cloud?
Does this mean that FCOS will become a 1-click install option on Hetzner Cloud?
No, we will produce disk images that you will have to upload to hetzner, that's the best we can do
Does this mean that FCOS will become a 1-click install option on Hetzner Cloud?
No, we will produce disk images that you will have to upload to hetzner, that's the best we can do
No worries! That still sounds easier than the recovery mode work-arounds I keep seeing
In order to implement support for a new cloud platform in Fedora CoreOS, we need to know several things about the platform. Please try to answer as many questions as you can.
Hetzner: https://www.hetzner.com/cloud
“Hetzner Online GmbH is a company and data center operator based in Gunzenhausen, Germany.” – https://en.wikipedia.org/wiki/Hetzner
According to Enlyft, over 180,000 companies use their services.
In 2021, they apparently had over 200,000 servers in just one of their data centres (https://www.youtube.com/watch?v=5eo8nz_niiM).
Personally, they offer the fastest instance creation times I’ve seen, an excellent API, and their prices are among the lowest available. All of these make them perfect for use for the small web. Unfortunately, since they don’t support CoreOS, I’m going to likely have to build the small web stuff on Ubuntu to start with. Which is less than ideal as I’d love for the instances to be auto-updating with a minimum of maintenance required. (The closest thing to that currently is Ubuntu LTS with automatic security updates enabled but that doesn’t, of course, cover major version updates.)
Hetzner.
Currently uses CloudInit, as far as I know (at least for Ubuntu instances). If no userdata is provided, no customisation occurs.
Yes (through their interface/API). If none are provided, it sets a root password and emails it to the person.
I don’t know, sorry.
All regular hostname commands appear to work. Not sure if that’s what you’re asking though.
I don’t know, sorry.
Not sure.
I don’t believe so. I haven’t encountered it in the instances I’ve set up, at least.
Online interface + API. I haven’t used this personally.
Likely, but I haven’t encountered any in my use of their services :)