canonical / packer-maas

Packer templates to create MAAS deployable images
Other
279 stars 169 forks source link

When deploying custom-ubuntu.tar.gz, the machine stays with "Deploying" even after "Installation complete" #232

Closed kojiwell closed 1 month ago

kojiwell commented 4 months ago

When I make and then deploy a custom-ubuntu.tar.gz to a baremetal machine on our MAAS, the machine somehow stays "Deploying" even after "Installation complete - Node disabled netboot" is logged in the Event log. The machine doesn't respond to ping. And 30 minutes later, the machine will be marked as "Failed deployment."

I checked the machine's status via a remote console; the OS was up; I could log in. However, the /etc/netplan/50-cloud-init.yaml doesn't exist, which is supposed to be generated during the deployment. So, there seems to be an issue on finalizing the deployment process after the installed OS is booted.

Here's the snippet of the Event log around the end of deployment.


Mon, 20 May. 2024 22:36:56 | Node changed status - From 'Deploying' to 'Failed deployment' Mon, 20 May. 2024 22:36:56 | Marking node failed - Node operation 'Deploying' timed out after 30 minutes. Mon, 20 May. 2024 22:06:12 | Script result - /tmp/install.log changed status from 'Running' to 'Passed' Mon, 20 May. 2024 22:06:12 | Rebooting Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-power_state_change with frequency once-per-instance Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-final_message with frequency always Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-ssh_authkey_fingerprints with frequency once-per-instance Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-install_hotplug with frequency once-per-instance Mon, 20 May. 2024 22:06:12 | Node installation - 'cloudinit' running config-keys_to_console with frequency once-per-instance Mon, 20 May. 2024 22:06:10 | Installation complete - Node disabled netboot


Here's the bottom lines in the Installation output. Curtin seems to have finished installation without an issue.

Saving to: ‘/dev/null’

     0K                                                       100%  142K=0s

2024-05-14 05:39:23 (142 KB/s) - ‘/dev/null’ saved [2/2]

curtin: Installation finished.

Here's the versions of related apps.

~# packer --version
Packer v1.10.0

~# snap info maas|grep installed
installed:          3.5.0~rc4-16292-g.18b753d78              (35434) 196MB -

packer-maas# git branch
* main

Here's how I have registered the ubuntu-custom image.

git clone https://github.com/canonical/packer-maas
cd packer-maas/ubuntu
make custom-ubuntu.tar.gz
maas $MAAS_USER boot-resources create name='custom/ubuntu2204' title='custom/ubuntu2204' architecture='amd64/generic' filetype='tgz' base_image='ubuntu/jammy' content@=custom-ubuntu.tar.gz

Thank you for your attention. I will appreciate any comments.

alexsander-souza commented 4 months ago

can you run the following commands in the deployed machine?

Also get the content of /var/log/cloud-init.log and /var/log/cloud-init-output.log

kojiwell commented 4 months ago

Thank you, @alexsander-souza

Here're the snippets of the outputs you asked, all of which indicate that the machine is unable to communicate with the MAAS.

Please note that the host '172.16.40.251' is actually reachable because I added the following lines in packer-maas/ubuntu/scripts/install-custom-packages and created the custom image again to become able to ssh to the deployed machine.

cat << EOF > /etc/netplan/00-installer-config.yaml
network:
  ethernets:
    enp7s0f0:
      dhcp4: true
  version: 2
EOF

Here's the ping result.

ubuntu@ubuntu:~$ ping -c 3 172.16.40.251
PING 172.16.40.251 (172.16.40.251) 56(84) bytes of data.
64 bytes from 172.16.40.251: icmp_seq=1 ttl=64 time=0.764 ms
64 bytes from 172.16.40.251: icmp_seq=2 ttl=64 time=0.764 ms
64 bytes from 172.16.40.251: icmp_seq=3 ttl=64 time=0.672 ms

--- 172.16.40.251 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2048ms
rtt min/avg/max/mdev = 0.672/0.733/0.764/0.043 ms

This problem doesn't happen with the official Ubuntu image on MAAS, so there are some differences between the official and custom images (e.g. missing required apt packages, etc.)?

Thank you!

kojiwell commented 4 months ago

By the way, I've also created and registered the custom rocky8 and 9 tar.gz images in the same way, which get deployed without a problem. So Rocky is fine.

This problem seems unique to the custom ubuntu (and maybe debian as well).

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 30 days with no activity.

alexsander-souza commented 3 months ago

this was probably fixed by #242, please try again

kojiwell commented 1 month ago

@alexsander-souza I just tried it again, and it looks like the problem is resolved. Thank you so much!