canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.99k stars 883 forks source link

Removing MAAS preseed broke oracular deployments #5685

Closed alexsander-souza closed 2 months ago

alexsander-souza commented 2 months ago

Bug report

Removing MAAS preseed (#5487) broke all MAAS deployments.

Steps to reproduce the problem

  1. install any recent MAAS version
  2. enlist a machine or VM
  3. try to deploy Ubuntu Oracular

Without the preseed code, cloud-init will set datasource_list to [ MAAS ] but won't create /etc/cloud/cloud.cfg.d/90_dpkg_maas.cfg with the required parameters.

Environment details

cloud-init logs

status: error
extended_status: error - done
boot_status_code: enabled-by-generator
last_update: Thu, 01 Jan 1970 00:00:12 +0000
detail: Cloud-init enabled by systemd cloud-init-generator
errors:
    - No instance datasource found.
    - Can not apply stage config, no datasource found! Likely bad things to come!
    - Can not apply stage final, no datasource found! Likely bad things to come!
recoverable_errors:
WARNING:
    - No instance datasource found! Likely bad things to come!
    - Can not apply stage config, no datasource found! Likely bad things to come!
    - Can not apply stage final, no datasource found! Likely bad things to come!
blackboxsw commented 2 months ago

We need to reconstitute both handle_preseed_maas and handle_preseed_local_cloud_config because maas provides key config in db settings for maas-metadata-url maas-metadata-credentials and local-cloud-config

holmanb commented 2 months ago

We need to reconstitute both handle_preseed_maas and handle_preseed_local_cloud_config because maas provides key config in db settings for maas-metadata-url maas-metadata-credentials and local-cloud-config

Yes, we can't break oracular this late.

However, this also reveals an ownership problem. MAAS is broken by cloud-init, yet cloud-init behaves correctly. Long term this code should move into a codebase that MAAS owns and can maintain, rather than relying on image build behaviors that are built into cloud-init's packaging scripts.

I'll submit a PR to fix this and file an issue against MAAS so that we can track eventually removing this from cloud-init.

blackboxsw commented 2 months ago

We need to reconstitute both handle_preseed_maas and handle_preseed_local_cloud_config because maas provides key config in db settings for maas-metadata-url maas-metadata-credentials and local-cloud-config

Yes, we can't break oracular this late.

However, this also reveals an ownership problem. MAAS is broken by cloud-init, yet cloud-init behaves correctly. Long term this code should move into a codebase that MAAS owns and can maintain, rather than relying on image build behaviors that are built into cloud-init's packaging scripts.

I'll submit a PR to fix this and file an issue against MAAS so that we can track eventually removing this from cloud-init.

I think this may be a misunderstanding of how MAAS interacts with cloud-init during commisioning/provisioning. I believe MAAS still calls either debconf or dpkg-reconfigure cloud-init providing debconf selections during commissioning based on preseed config settings to setup MAAS URLs for communication back to maas of all cloud-init logs, this triggers a call to cloud-init.postinst configure which will invoke the handle_preseed_mass/local_cloud_config functions. This is something that I think needs to be in place prior to system first boot (as triggered by postinst 'configure ') to ensure all logging and oauth credentials are configured to actually talk to MAAS and get commisioning scripts etc from various endpoints. I'm not certain there is a simple way to work through a feature that triggers cloud-init to perform such setup before "provisioning first boot" but that's certainly something we can work toward with MAAS folks,

I haven't looked through MAAS commisioning/provisioning code in quite a while so maybe @alexsander-souza can point us to/confirm how maas is using preseed debconf config in various boot stages.

blackboxsw commented 2 months ago

I believe the debconf setup is invoked early by curtin in curthooks.py but that may be grasping at straws.

alexsander-souza commented 2 months ago

you are correct, Curtin drives this process

  1. MAAS boots the machine in ephemeral mode

  2. Curtin preseed is created

relevant bits:

{
    "cloudconfig": {
        "maas-cloud-config": {
            "content": "#cloud-config\ndatasource:\n  MAAS:\n    consumer_key: HenhLM3rCEEjWdgWNn\n    metadata_url: http://10.20.0.3:5248/MAAS/metadata/\n    token_key: zkeCCmbzJBZVNR96Eu\n    token_secret: L4yKKW2MuwS9yWhNsr8REUMDvPTTNGtZ\n",
            "path": "/etc/cloud/cloud.cfg.d/90_maas_cloud_config.cfg",
        },
        "maas-datasource": {
            "content": "datasource_list: [ MAAS ]",
            "path": "/etc/cloud/cloud.cfg.d/90_maas_datasource.cfg",
        },
        "maas-reporting": {
            "content": "#cloud-config\nreporting:\n  maas:\n    consumer_key: HenhLM3rCEEjWdgWNn\n    endpoint: http://10.20.0.3:5248/MAAS/metadata/status/mdar4c\n    token_key: zkeCCmbzJBZVNR96Eu\n    token_secret: L4yKKW2MuwS9yWhNsr8REUMDvPTTNGtZ\n    type: webhook\n",
            "path": "/etc/cloud/cloud.cfg.d/90_maas_cloud_init_reporting.cfg",
        },
        "maas-ubuntu-sso": {
            "content": "#cloud-config\nsnap:\n  email: admin@\n",
            "path": "/etc/cloud/cloud.cfg.d/90_maas_ubuntu_sso.cfg",
        },
    },
    "debconf_selections": {
        "grub2": "grub2   grub2/update_nvram  boolean false",
        "maas": "cloud-init   cloud-init/datasources  multiselect MAAS\ncloud-init   cloud-init/maas-metadata-url  string http://10.20.0.3:5248/MAAS/metadata/\ncloud-init   cloud-init/maas-metadata-credentials  string oauth_consumer_key=HenhLM3rCEEjWdgWNn&oauth_token_key=zkeCCmbzJBZVNR96Eu&oauth_token_secret=L4yKKW2MuwS9yWhNsr8REUMDvPTTNGtZ\ncloud-init   cloud-init/local-cloud-config  string manage_etc_hosts: true\\nmanual_cache_clean: true\\nreporting:\\n  maas:\\n    consumer_key: HenhLM3rCEEjWdgWNn\\n    endpoint: http://10.20.0.3:5248/MAAS/metadata/status/mdar4c\\n    token_key: zkeCCmbzJBZVNR96Eu\\n    token_secret: L4yKKW2MuwS9yWhNsr8REUMDvPTTNGtZ\\n    type: webhook\\n\n",
    },
}

(I know, it's suspiciously redundant, I'm going to check this)

  1. Curtin writes the image to the disk, mount it, chroot to it and apply the debconf selections

  2. system reboots, MAAS tells grub to chainload to the bootloader on disk

alexsander-souza commented 2 months ago

the cloudconfig section is used by RHEL and SUSE based images, while the debconf_selections is used by ubuntu/debian based deployments

alexsander-souza commented 2 months ago

we need to talk to Curtin folks to understand why this is done differently for Ubuntu, at first glance it looks like we could use the same mechanism

blackboxsw commented 2 months ago

Added a followup issue https://github.com/canonical/cloud-init/issues/5688 to at least get integration test coverage for cloud-init MAAS preseed file creation by postinst to ensure that functionality is better documented and avoids regression in cloud-init tests.

holmanb commented 2 months ago

However, this also reveals an ownership problem. MAAS is broken by cloud-init, yet cloud-init behaves correctly. Long term this code should move into a codebase that MAAS owns and can maintain, rather than relying on image build behaviors that are built into cloud-init's packaging scripts.

I'll submit a PR to fix this and file an issue against MAAS so that we can track eventually removing this from cloud-init.

I think this may be a misunderstanding of how MAAS interacts with cloud-init during commisioning/provisioning.

What makes you say that? Please clarify what you think the misunderstanding is.

I believe MAAS still calls either debconf or dpkg-reconfigure cloud-init providing debconf selections during commissioning based on preseed config settings to setup MAAS URLs for communication back to maas of all cloud-init logs, this triggers a call to cloud-init.postinst configure which will invoke the handle_preseed_mass/local_cloud_config functions.

This is what I expected.

This is something that I think needs to be in place prior to system first boot (as triggered by postinst 'configure ') to ensure all logging and oauth credentials are configured to actually talk to MAAS and get commisioning scripts etc from various endpoints.

This functionality needs to exist somewhere - I don't object to that. I just don't think that this has to exist in cloud-init's Ubuntu packaging. The current implementation makes MAAS's implementation distro-specific without a specific benefit to Ubuntu. Using debconf was an implementation choice - one that ties this implementation tightly to debian distros.

I'm not certain there is a simple way to work through a feature that triggers cloud-init to perform such setup before "provisioning first boot" but that's certainly something we can work toward with MAAS folks,

This behavior has to exist for other distros as well for MAAS, so why should preseed be funneled through debconf? Just because one can use debconf to configure an Ubuntu package doesn't mean that one should. And in this case since this is functionality that is owned by MAAS and needs to be distro-agnostic, I think that there is sufficient reason to say that this shouldn't be implemented using debian packaging scripts and that this shouldn't be implemented in cloud-init's postinst. MAAS seems like the rightful owner of this functionality, whether that is implemented with curtin or some other method.

holmanb commented 2 months ago

(I know, it's suspiciously redundant, I'm going to check this)

@alexsander-souza thanks for digging that up. I'm happy to assist with crafting a more distro-agnostic solution, which I think would benefit both MAAS and cloud-init.

frenchwr commented 2 months ago

Hello! Would there happen to be any workarounds for this issue until a fix is deployed?

alexsander-souza commented 2 months ago

Hello! Would there happen to be any workarounds for this issue until a fix is deployed?

The fix is already available in the Oracular archive, so I expect that it will be included in the next image built, probably tomorrow

blackboxsw commented 2 months ago

Fixed per #5686 and published to oracular yesterday as cloud-init version 24.4~3+really24.3.1-0ubuntu2. The fix will be in cloud-init Oracular server image builds with /etc/cloud/build.info: serial: 20240911 or later.