flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
654 stars 27 forks source link

[RFE] Azure Stack images/support #1234

Open tla5 opened 8 months ago

tla5 commented 8 months ago

Hello, I'm opening this issue as recommended on the Matrix Flatcar channel.

Current situation

From what I can tell, the Azure Flatcar image cannot reliably be used out of the box on Azure Stack due to issues related to Ignition: the provider being set to azure, Ignition attempts to get its configuration from the userData property through the IMDS API (code).

This attribute is not available on Azure Stack yet, but IMDS still replies to Ignition which causes ignition-fetch to hang (and the VM to stay stuck in a perpetual Creating state).

Impact

Flatcar Azure images cannot be used on Azure Stack, and no off-the-shelf alternative is proposed as far as I know.

Ideal future situation/Implementation options

It would be good to have either official images configured for Azure Stack, or a way to customize the existing Azure image to change Ignition's provider configuration from azure to azurestack (i.e. by patching the OEM partition).

Additional information

Here are some relevant logs from an attempt at running an Azure Stack-hosted Flatcar VM using flatcar_production_azure_image.vhd

[    0.000000] Linux version 5.15.136-flatcar (build@pony-truck.infra.kinvolk.io) (x86_64-cros-linux-gnu-gcc (Gentoo Hardened 12.2.1_p20230304 p13) 12.2.1 20230304, GNU ld (Gentoo 2.39 p6) 2.39.0) #1 SMP Mon Oct 23 16:44:45 -00 2023
---[...]---
[   18.922635] ignition[681]: Ignition 2.15.0
[  OK  ] Finished ignition-fetch-of…ce - Ignition (fetch-offline).
[   18.942985] systemd[1]: Finished ignition-fetch-offline.service - Ignition (fetch-offline).
[   18.978946] ignition[681]: Stage: fetch-offline
[   18.985264] systemd[1]: Starting ignition-fetch.service - Ignition (fetch)...
[   18.985395] igniti681]: reading system config file "/usr/lib/ignition/base.d/base.ign"
         Starting ignition-fetch.service - Ignition (fetch)...
[   18.989589] ignition[681]: no config dir at "/usr/lib/ignition/base.platform.d/azure"
[   19.06gnition[681]: no config URL provided
[   19.073441] ignition[681]: reading system config file "/usr/lib/ignition/user.ign"
[   19.091342] ignition[681]: no config at "/usr/lib/ignition/user.ign"
[   19.105765] ignition[681]: failed to fetch config: resource requires networking
[   19.1221gnition[681]: nition finished successf[   19.139106] ignition[716]: Ignition 2.
[   19.153332] ignition[716]: Stage: fetch
[   19.163852] ignition[716]: reading system config file "/usr/lib/ignition/base.d/base.ign"
[   19.183148] ignition[716]: no config dir at "/usr/lib/ignition/base.platform.d/azure"
[   19.213420] ignition[716]: no config URL provided
[   19.224599] ignition[716]: reading system config file "/usr/lib/ignition/user.ign"
[   19.241228] ignition[716]: no config at "/usr/lib/ignition/user.ign"
[   19.254925] ignition[716]: GET http://169.254.169.254/metadata/instance/compute/userData?api-version=2021-01-01&format=text: attempt #1
[   19.447560] systemd-networkd[692]: eth0: Ged IPv6LL
[*     ] Job ignition-fetch.service/start running (16s / no limit)
[   21.270726] ignition[716]: GET result: OK
M
[**    ] Job ignition-fetch.service/start running (17s / no limit)
M
[***   ] Job ignition-fetch.service/start running (17s / no limit)
M
[ ***  ] Job ignition-fetch.service/start running (18s / no limit)
M
[  *** ] Job ignition-fetch.service/start running (19s / no limit)
M
[   ***] Job ignition-fetch.service/start running (19s / no limit)
M
[    **] Job ignition-fetch.service/start running (20s / no limit)
M
[     *] Job ignition-fetch.service/sng (20s / no limit)
---[...]---
[***   ] Job ignition-fetch.service/start running (22min 30s / no limit)
M
[ ***  ] Job ignition-fetch.service/start running (22min 30s / no limit)
M
[  *** ] Job ignition-fetch.service/start running (22min 31s / no limit)
M
[   ***] Job ignition-fetch.service/start running (22min 31s / no limit)
M
[    **] Job ignition-fetch.service/start running (22min 32s / no limit)
M
[     *] Job ignition-fetch.service/start running (22min 32s / no limit)
---[...]---

Running a query from another VM hosted on that Azure Stack instance to the IMDS service yields the following:

$ curl -v --header "Metadata: true" "http://169.254.169.254/metadata/instance/compute/userData?api-version=2021-01-01&format=text"
*   Trying 169.254.169.254:80...
* TCP_NODELAY set
* Connected to 169.254.169.254 (169.254.169.254) port 80 (#0)
> GET /metadata/instance/compute/userData?api-version=2021-01-01&format=text HTTP/1.1
> Host: 169.254.169.254
> User-Agent: curl
> Accept: */*
> Metadata: true
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< connection:close
< content-length:0
< content-type:text/plain; charset=utf-8
< date:Thu, 26 Oct 2023 16:10:46 GMT
< server:IMDS/--snip--
<
* Closing connection 0

Interestingly, the response is empty but, despite what's commented in the Ignition provider, it doesn't seem to fall back to the OVF device method.

tormath1 commented 8 months ago

Hello @tla5 thanks for the complete report. I wrote the Ignition IMDS Azure fetching and indeed I recall that it can return a status code 200 with an empty config, that's why there is a check on the length received which triggers the fallback on custom data. I did not reproduced but I feel it fails before I think it does not succeed to get the answer from the IMDS server if it was the case, even with an empty config, it would have failed later. Here, we can clearly see it's hanging. I'll try to reproduce for investigation.

tormath1 commented 7 months ago

I think it fails because azurestack is a bit different regarding the CDROM filesystems: https://github.com/coreos/ignition/blob/528dbbfba1404cda78bcc7628b06dbea9d441388/internal/providers/azurestack/azurestack.go#L31-L34

// These constants are the types of CDROM filesystems that might // be used to present a custom-data volume. Azure proper uses a // udf volume, while Azure Stack might use udf or iso9660.

In this case, if we're using azure we might miss the iso9660 fs which makes us fall into this infinite loop: https://github.com/coreos/ignition/blob/528dbbfba1404cda78bcc7628b06dbea9d441388/internal/providers/azure/azure.go#L169

I'm wondering if we could not merge azurestack and azure Ignition provider and teach Ignition to discover if it's running in azurestack or azure.

dustymabe commented 7 months ago

Some of the historical discussion about azure vs azurestack support in Fedora CoreOS could be useful here.

tla5 commented 7 months ago

Thank you for your investigation!

teach Ignition to discover if it's running in azurestack or azure

This could be done by querying the IMDS' compute endpoint, which returns a specific azEnvironment value on Azure Stack