coreos / afterburn

A one-shot cloud provider agent
https://coreos.github.io/afterburn/
Apache License 2.0
206 stars 106 forks source link

providers: add a new "azurestack" platform (client logic for AzureStackHub) #441

Closed darkmuggle closed 1 year ago

darkmuggle commented 4 years ago

Afterburn fails completely due to 500 errors on the metadata source. With https://github.com/coreos/ignition/commit/0c0ec63a86a54f454f80740e99954ce86513f30a I was able to boot on AzureStack.

A complete different issue:

s)...[   32.768570] NetworkManager[568]: <info>  [1593115470.6525] dhcp4 (eth0): option private_245          => 'a8:3f:81:10'

And then:

[   64.908820] afterburn[658]: Jun 25 19:57:52.985 WARN Failed to get fabric address from DHCP: maximum number of retries (60) reached
[   64.988395] afterburn[658]: Jun 25 19:57:52.986 INFO Using fallback address
[   65.033307] afterburn[658]: Jun 25 19:57:52.986 INFO Fetching http://168.63.129.16/?comp=versions: Attempt #1
^M[     *] A start job is running for Afterburn Hostname (52s / no limit)
[   65.566088] afterburn[658]: Jun 25 19:57:53.643 INFO Fetch successful
[   65.621959] afterburn[658]: Jun 25 19:57:53.643 INFO Fetching http://168.63.129.16/machine/?comp=goalstate: Attempt #1
[   65.698749] afterburn[658]: Jun 25 19:57:53.651 INFO Fetch successful
[   65.747770] afterburn[658]: Jun 25 19:57:53.659 INFO Fetching http://169.254.169.254/metadata/instance/compute/name?api-version=2017-08-01&format=text: Attempt #1
[   65.942651] afterburn[658]: Jun 25 19:57:53.674 INFO Failed to fetch: 500 Internal Server Error

And ending with:

Displaying logs from failed units: afterburn-hostname.service
-- Logs begin at Thu 2020-06-25 20:04:16 UTC, end at Thu 2020-06-25 20:05:59 UTC. --
Jun 25 20:05:51 afterburn[655]: Jun 25 20:05:51.338 INFO Failed to fetch: 500 Internal Server Error
Jun 25 20:05:51 afterburn[655]: Error: failed to run
Jun 25 20:05:51 afterburn[655]: Caused by: writing hostname
Jun 25 20:05:51 afterburn[655]: Caused by: failed to get hostname
Jun 25 20:05:51 afterburn[655]: Caused by: maximum number of retries (10) reached
Jun 25 20:05:51 afterburn[655]: Caused by: failed to fetch: 500 Internal Server Error
Jun 25 20:05:51 systemd[1]: afterburn-hostname.service: Main process exited, code=exited, status=1/FAILURE
Jun 25 20:05:51 systemd[1]: afterburn-hostname.service: Failed with result 'exit-code'.
Jun 25 20:05:51 systemd[1]: Failed to start Afterburn Hostname.
darkmuggle commented 4 years ago

Since this AzureStack and its not a supported variant (AFAIK), I'm calling this a feature request and NOT a bug. Unless other ~victim~ are eager to work on this, I'd like to volunteer myself.

Work on this is tenatively scheduled for the 4.7 OCP cycle.

darkmuggle commented 4 years ago

/cc @cfBrianMiller

lucab commented 4 years ago

option private_245 => 'a8:3f:81:10'

This in fact 168.63.129.16. So, bad that we don't have https://github.com/coreos/afterburn/issues/146 but good that the fallback worked there too.

http://169.254.169.254/metadata/instance/compute/name?api-version=2017-08-01&format=text

According to https://github.com/coreos/fedora-coreos-tracker/issues/476#issuecomment-649898246 the problem is with the API version. Which is weird because the (Azure) platform docs at https://docs.microsoft.com/en-us/azure/virtual-machines/windows/instance-metadata-service#versioning explicitly mention the version we are using. See https://docs.microsoft.com/en-us/azure-stack/user/azure-stack-vm-considerations?view=azs-2002#api-versions on API versions for AzureStack.

Going a bit further, the hostname is the simplest logic on Azure, so it's concerning that already this one fails on AzureStack. How do SSH keys logic and boot check-in logic behave on such platform?

darkmuggle commented 4 years ago

Allegedly -- and I've asked for documentation -- but AzureStack does not support the instance meta-data service.

pekramp commented 4 years ago

Allegedly -- and I've asked for documentation -- but AzureStack does not support the instance meta-data service.

got the documentation for you https://docs.microsoft.com/en-us/azure-stack/user/azure-stack-vm-considerations?view=azs-2002

Azure Instance Metadata Service | The Azure Instance Metadata Service provides info about running VM instances that can be used to manage and set up your VM. | The Azure Instance Metadata Service isn't supported on Azure Stack Hub.

lucab commented 4 years ago

Which then makes me wonder, where does an AzureStack instance get its hostname? Is that in the DHCP options?

darkmuggle commented 4 years ago

Now that Ignition [1] treats Azure Stack as a separate platform, we might have "just enough" to get FCOS/RHCOS booted on Azure{Stack,Hub} [1a, 1b]. The provided OVF from Microsoft looks suspect to me; the XML looks like its describing a Windows instance. Regardless the OVF XML given to us for AzureStack deviates substantially from what we know exists on Azure.

I started a stub [3], but after looking FCOS [4] packaging and RHCOS's previous failure to boot (caused by Afterburn checking in as if it was on AzureStack) is really superfluous.

The next steps are:

[1] https://github.com/coreos/ignition/blob/master/internal/providers/azurestack/azurestack.go [1a] caveat emptor: this has not been tested on Azure Stack [1b] caveat emptor: "just enough" is assumed to mean basic function means booting to a console. No Afterburn support. Remote access would be dependant on SSH keys provided by Ignition. Node is likely unusable beyond a POC. [2] https://github.com/coreos/afterburn/pull/463#issuecomment-663708878 [3] https://github.com/coreos/afterburn/commit/40f00e8b6e546bb0a45f269cc8491aecd54e712e [4] https://src.fedoraproject.org/rpms/rust-afterburn/blob/master/f/rust-afterburn.spec#_49-53 uses the defaults set in https://github.com/coreos/afterburn/blob/master/systemd/afterburn-checkin.service which would not apply to AzureStack

darkmuggle commented 4 years ago

/cc @ashcrow @miabbott

darkmuggle commented 4 years ago

As it turns out, we need to implement check-in support. FCOS will boot, but it will NOT check-in and get a hostname.

prestist commented 1 year ago

Done in #561