coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
264 stars 59 forks source link

Platform Request: Hetzner #1324

Open aral opened 2 years ago

aral commented 2 years ago

In order to implement support for a new cloud platform in Fedora CoreOS, we need to know several things about the platform. Please try to answer as many questions as you can.

Hetzner: https://www.hetzner.com/cloud

“Hetzner Online GmbH is a company and data center operator based in Gunzenhausen, Germany.” – https://en.wikipedia.org/wiki/Hetzner

According to Enlyft, over 180,000 companies use their services.

In 2021, they apparently had over 200,000 servers in just one of their data centres (https://www.youtube.com/watch?v=5eo8nz_niiM).

Personally, they offer the fastest instance creation times I’ve seen, an excellent API, and their prices are among the lowest available. All of these make them perfect for use for the small web. Unfortunately, since they don’t support CoreOS, I’m going to likely have to build the small web stuff on Ubuntu to start with. Which is less than ideal as I’d love for the instances to be auto-updating with a minimum of maintenance required. (The closest thing to that currently is Ubuntu LTS with automatic security updates enabled but that doesn’t, of course, cover major version updates.)

Hetzner.

Currently uses CloudInit, as far as I know (at least for Ubuntu instances). If no userdata is provided, no customisation occurs.

Yes (through their interface/API). If none are provided, it sets a root password and emails it to the person.

I don’t know, sorry.

All regular hostname commands appear to work. Not sure if that’s what you’re asking though.

I don’t know, sorry.

Not sure.

I don’t believe so. I haven’t encountered it in the instances I’ve set up, at least.

Online interface + API. I haven’t used this personally.

Likely, but I haven’t encountered any in my use of their services :)

bgilbert commented 2 years ago

Thanks for filing this. Note that we'll need these answers from the OS perspective, not the user perspective. E.g., how does the OS fetch userdata? How does it learn its own hostname?

bgilbert commented 2 years ago

Some previous discussion in:

Apparently the platform is sometimes called hcloud.

jlebon commented 2 years ago

We discussed this in today's community meeting:

13:10:54 < jlebon> #agreed we would like to add support for Hetzner. we are looking
                   for volunteers to pick it up and push it forward.
aral commented 2 years ago

Happy to hear it. Would you like me to ask around to see if I can find some contacts there or do you already have folks you can talk to?

bgilbert commented 2 years ago

Yes, that would be helpful, thanks!

aral commented 2 years ago

Quick update: I’ve been in touch with an engineer at Hetzner:

“I passed it to the responsible people but they are on vacation for the next two weeks … I will force it to be answered then :)”

der-On commented 2 years ago

At least for reseting passwords a "QEMU Guest Agent" is running on the OS.

bgilbert commented 2 years ago

@der-On That might be the standard QEMU one, discussed in #74. We generally avoid shipping third-party agents (and reimplement pieces of them when necessary to avoid it), and don't currently ship the QEMU one.

asciiprod commented 2 years ago

So, how could we help to get CoreOS running on Hetzner Cloud?

bgilbert commented 2 years ago

@asciiprod Thanks for joining in! We could use some help answering the questions at the top of this issue. We have some answers already in the old Container Linux PRs, but it'd be good to make sure our understanding is up to date.

asciiprod commented 2 years ago

Sure, I'll try to answer them as good as I can:

lucab commented 2 years ago

@asciiprod thanks for the detailed feedback! Some additional thoughts from my side:

asciiprod commented 2 years ago

The canonical documentation is available at: https://docs.hetzner.cloud

If we want to include the image on the platform, it must support passing a password hash as instance creation does not force selecting an SSH key.

Using UEFI-only for given image is something that is currently not implemented. I'd have to check internally if we could do that.

The internal workflow for images does not import any external disk images. Hetzner Cloud images are generated by automated installations (e.g. kickstart/subiquity) from distribution ISOs using packer & ansible. This leads to a compressed (zstd) raw disk image, which is uploaded as an image snapshot and used to test and validate the new build on the platform. That is the point were it could be possible to import a pre-build external disk image. However that would have to be discussed internally, if it is acceptable to open this process up for 3rd party generated images.

From a release and support point of view, I think we could only support the stable version.

Currently no aarch64 Cloud instances, but as we offer Ampere dedicated servers, that's something I would keep on the list and I'd say they work the same way (probably UEFI-only)

lucab commented 2 years ago

Ah great, thanks. The page I was looking for is https://docs.hetzner.cloud/#server-metadata (though it doesn't currently cover the userdata part).

bgilbert commented 2 years ago

Thanks for the detailed info, this is very helpful!

If we want to include the image on the platform, it must support passing a password hash as instance creation does not force selecting an SSH key.

I don't think we should support this. Fedora CoreOS tries to encourage the use of best practices, and passwords aren't that. On other platforms, Fedora CoreOS instances are usually configured with an SSH key passed in the Ignition config.

From a release and support point of view, I think we could only support the stable version.

We always recommend that users run some testing and next instances alongside their stable instances to help us catch regressions before they're promoted to a stable release. Thus, those streams are an important part of any Fedora CoreOS deployment strategy. It's entirely reasonable for Hetzner not to provide customer support for those streams, but it's important that they be available alongside stable. If that isn't possible, I think we shouldn't pursue adding stable either, and either only document the custom deployment flow or not document Hetzner Cloud at all.

asciiprod commented 2 years ago

I totally agree that SSH keys should be used and we also strongly recommend it during instance creation. But we do offer a password fallback for the existing OS images. So if CoreOS does not support it, we would need to enforce it.

In any case the more CoreOS specific changes we would need to make, the more difficult it becomes to adopt it for Hetzner Cloud.

jlebon commented 2 years ago

I totally agree that SSH keys should be used and we also strongly recommend it during instance creation. But we do offer a password fallback for the existing OS images. So if CoreOS does not support it, we would need to enforce it.

In any case the more CoreOS specific changes we would need to make, the more difficult it becomes to adopt it for Hetzner Cloud.

IIUC from the docs, it seems like the password hash is injected into the user-data, which is assumed to be a cloud-init config. Is that correct? I derived this from the fact that there's no entry for it in the Server Metadata section. (Aside: it seems like that section is missing an entry for public-keys, no?)

If that's the case, that logic would have to learn to support Ignition configs too. Password authentication is disabled by default on FCOS, so it would have to inject a drop-in for it. Also, the default sshd config (at least on Fedora) prohibits password authentication for the root user so it would have to undo that too.

What happens if no SSH keys are provided and the user-data isn't a cloud-init config? Does the API return an error because it doesn't know how to inject a root password? That seems like acceptable behaviour for the time being and avoids adding anything FCOS-specific.

asciiprod commented 2 years ago

The metadata API provides either the user-selected SSH-key or a random generated password hash if no SSH-key is selected. So instance creation will always succeed. I have to apologize for the incomplete docs. The metadata service has of course a field/path for the public-keys and network-config. Please correct me if I am wrong. As far as I understand it, we are currently only missing an afterburn provider to make CoreOS work on our platform. If that is correct, having it would enable us and anyone else to start using/testing it. And it would also allow to resolve the other questions (password support, UEFI-only, releases) separately and step by step.

lucab commented 2 years ago

Yes, if we want to start making incremental progresses on this then the next immediate things to sort out on FCOS side are:

  1. pick up and document a platform identifier (see https://coreos.github.io/ignition/supported-platforms/)
  2. make Afterburn and Ignition aware of this new platform (see https://github.com/coreos/afterburn/pull/125 and https://github.com/coreos/ignition/pull/1262)
aral commented 1 year ago

Hey everyone (@asciiprod, @lucab), any progress on this?

It would be really amazing to be able to boot up a Fedora CoreOS instance on Hetzner in under a minute (that’s how fast the supported instances boot up; it’s a game-changer for Small Web use) :)

aral commented 1 year ago

Hey folks, any updates on this? Would still love to see it happen. Has communication between Fedora and Hetzner stalled? If so, how do we get it going again? :)

bgilbert commented 1 year ago

@aral I think this thread has all of the needed information now, or at least most of it. There are some old Afterburn and Ignition PRs that'll need a rebase and an update based on the information here. I don't think anyone is currently working on that, but feel free to run with it if you'd like!

mhutter commented 1 year ago

I took some time this weekend to look into making this reality, Note that I'm not very familiar with CoreOS, and it's ecosystem. I'm learning as I go. :)

Since I'm working on this in my free time, I'd like to know my time is well-invested. How could we make sure this actually ends up being implemented, and not just another series of PRs that then go stale for multiple years?

Additionally, I need some technical guidance:

  1. What does actually need to happen to fully support Hetzner Cloud?
    • From what I understand, Afterburn and Ignition need to be extended. All other linked PRs lead to archived repos, so I assume they're not needed anymore?
    • How do Ignition/Afterburn "know" what platform they're running on?
  2. How do Ignition and Afterburn relate to each other? It seems to me that they try to solve similar problems, and I could not find any documentation as to what the difference is between the two?

I would love to get some feedback & make this happen! I live in UTC+2, so I might not respond during the US day.

travier commented 1 year ago

Awesome! Welcome!

While the full checklist is at https://github.com/coreos/fedora-coreos-tracker/blob/main/.github/ISSUE_TEMPLATE/implementing-new-platform.md, I think we can get to something useful with a subset of the steps.

  1. What does actually need to happen to fully support Hetzner Cloud?

Let's focus on landing support in Ignition and Afterburn. Then you will be able to convert a QEMU image to an Hetzner one via a few guestfish commands.

  • How do Ignition/Afterburn "know" what platform they're running on?

Ignition and Afterburn know which platform their are own via the ignition.platform.id=<platform> argument on the kernel command line.

Thus once we have support in Ignition and Afterburn, you'll be able to replace ignition.platform.id=qemu by ignition.platform.id=hetzner in the bootloader config file from an existing image to get a working Hetzner image.

  1. How do Ignition and Afterburn relate to each other? It seems to me that they try to solve similar problems, and I could not find any documentation as to what the difference is between the two?

Afterburn is mainly here to enable booting images on clouds with zero configuration and have SSH keys automatically provisioned.

Ignition is able to fully configure the system to your needs but requires you to provide a configuration file.

I would love to get some feedback & make this happen! I live in UTC+2, so I might not respond during the US day.

travier commented 1 year ago

For Ignitions, existing PRs should help, and you can also take inspiration from https://github.com/canonical/cloud-init/blob/main/cloudinit/sources/DataSourceHetzner.py.

mhutter commented 1 year ago

Hi @travier thanks for answering my questions, now the pieces start falling into place.

Turns out the Ignition part is trivial, so I created a PR there as well.

travier commented 1 year ago

I've created a PR with the "simplified" steps to add a new platform: https://github.com/coreos/fedora-coreos-tracker/pull/1562

travier commented 1 year ago

Ignition PR: https://github.com/coreos/ignition/pull/1707 Afterburn PR: https://github.com/coreos/afterburn/pull/996

travier commented 1 year ago

Folks interested for initial support for this platform in Fedora CoreOS should open an issue with the emerging platform template and follow the steps there. Thanks!

aral commented 9 months ago

Any updates on this for 2024?

I can’t imagine how launching a CoreOS installation on Hetzner’s cloud in under a minute would be bad for either Fedora or Hetzner. (Not to mention that this would have the Small Web launch on CoreOS instead of Ubuntu as that’s really the only option I see at the moment otherwise for an affordable platform with instance creation measured in the seconds.)

Anyone know what’s blocking this and how we can try and route around it?

nachtjasmin commented 6 months ago

@aral There's a really good guide by @swick that explains how to install Fedora CoreOS on Hetzner servers. It's not as easy as the other operating systems provided by Hetzner, but it's a good enough workaround until they provide official support.

aral commented 6 months ago

@nachtjasmin Thanks, Jasmin, that is a good guide indeed. Sadly, for my needs (we will eventually have thousands of servers), that isn’t good enough so I’ve decided to go with AlmaLinux on Hetzner instead. It doesn’t automatically update like CoreOS, sadly, which would have been my first choice, but eight years of security updates should give us enough time to either implement a major version update system or transition to a transactional OS later.

aral commented 6 months ago

Since this doesn’t look like it’s going to be implemented and since I’m moving ahead with using a different OS, I’m closing this. Please feel free to reopen if anything changes.

thomasaull commented 6 months ago

@aral Why not just leave it open, since it’s not solved yet?

aral commented 6 months ago

@thomasaull I’ll leave that decision to the Fedora CoreOS folks. They can reopen it if they decide to work on it. It’s been open for over two years, there’s no reason to keep it open longer in my view.

thomasaull commented 6 months ago

@aral Got it. Just out of curiosity: What exactly is the issue with the snapshot approach? Boot duration too long?

aral commented 6 months ago

@thomasaull It’s too convoluted and specific to Hetzner. I don’t want to tie Domain so closely to one provider, even if Hetzner is the one we’re initially going to be supporting and to have a hacky workaround be the core way that servers are deployed for the Small Web.

Also, hopefully, we (Small Technology Foundation) won’t be the only ones running Domain instances – other organisations around the world will so it’s just not feasible to base such a system on a workaround.

(Boot duration isn’t the issue as Domain now uses prewarmed instances.)

In the future, once we have more resources, etc., we can maybe review the decision.

Hope that helps give some insight into my, admittedly rather unconventional, needs :)

thomasaull commented 6 months ago

@aral Thanks for the insights! I'll read up on Domain/Small Web

aral commented 6 months ago

@thomasaull If you’re going to, the end-to-end encrypted Kitten chat (https://ar.al/2023/02/20/end-to-end-encrypted-kitten-chat/) and Streaming HTML (https://ar.al/2024/03/08/streaming-html/) posts/videos should give you a good idea of where everything is. It’s a new stack, specifically for a peer-to-peer web (Small Web) 💕

Manawyrm commented 6 months ago

Since this doesn’t look like it’s going to be implemented

One of the biggest roadblocks currently is the UEFI requirement. We would love to have UEFI by default for everyone as well, but it would break the existing customers to roll this out for the current products (as new VMs with existing OS images might not boot in UEFI mode, if the image was created on a BIOS machine).

We also don't really want to offer a server image that doesn't boot in legacy BIOS mode (which then wouldn't boot on the older machine types).

travier commented 4 months ago

With support in Afterburn and Ignition now in stable, it should be possible to convert a QEMU FCOS image using the script in https://github.com/coreos/fedora-coreos-docs/issues/651 to an Hetzner one and use it to setup FCOS on Hetzner.

Testing welcomed! If successful, we should document that in the docs.

travier commented 4 months ago

While we do not yet provide ready made images for Hetzner, I've written documentation on how to setup Fedora CoreOS on Hetzner with what we have available right now: https://github.com/coreos/fedora-coreos-docs/pull/654

Testing and feedback welcomed!

dustymabe commented 4 months ago

While we do not yet provide ready made images for Hetzner

What's preventing this last piece ^^?

Looks like from the docs PR you are just changing the platform ID, is that it?

travier commented 4 months ago

Looks like from the docs PR you are just changing the platform ID, is that it?

Yes, that's the only bit missing. If there are no objections then we could start building those and that would definitely make it easier to provision an instance.

yasminvalim commented 4 months ago

We discussed this issue on FCOS community meeting today and agreed that we will start producing Hetzner images for Fedora CoreOS.

gaufde commented 2 months ago

We discussed this issue on FCOS community meeting today and agreed that we will start producing Hetzner images for Fedora CoreOS.

Does this mean that FCOS will become a 1-click install option on Hetzner Cloud?

image
jbtrystram commented 2 months ago

Does this mean that FCOS will become a 1-click install option on Hetzner Cloud?

No, we will produce disk images that you will have to upload to hetzner, that's the best we can do

gaufde commented 2 months ago

Does this mean that FCOS will become a 1-click install option on Hetzner Cloud?

No, we will produce disk images that you will have to upload to hetzner, that's the best we can do

No worries! That still sounds easier than the recovery mode work-arounds I keep seeing