csmart / ansible-role-virt-infra

Define and manage guests and networks on a KVM host with Ansible
GNU General Public License v3.0
67 stars 48 forks source link

SSH fails to start on CentOS Stream 8 VMs #66

Closed csmart closed 2 years ago

csmart commented 2 years ago

SSH fail to start on CentOS Stream 8 VMs, this seems to be because of two things:

The sshkey-gen.target is not run because cloud-init is enabled:

[root@swift-01 ~]# cat /etc/systemd/system/sshd-keygen\@.service.d/disable-sshd-keygen-if-cloud-init-active.conf
# In some cloud-init enabled images the sshd-keygen template service may race
# with cloud-init during boot causing issues with host key generation.  This
# drop-in config adds a condition to sshd-keygen@.service if it exists and
# prevents the sshd-keygen units from running *if* cloud-init is going to run.
#
[Unit]
ConditionPathExists=!/run/systemd/generator.early/multi-user.target.wants/cloud-init.target

But cloud-init isn't actually creating the host keys because of a config change:

[root@swift-01 ~]# diff -Nurd /etc/cloud/cloud.cfg /etc/cloud/cloud.cfg.rpmnew 
--- /etc/cloud/cloud.cfg        2021-06-03 18:10:45.162000000 +1000
+++ /etc/cloud/cloud.cfg.rpmnew 2022-04-30 17:06:20.000000000 +1000
@@ -7,7 +7,7 @@
 mount_default_fields: [~, ~, 'auto', 'defaults,nofail,x-systemd.requires=cloud-init.service', '0', '2']
 resize_rootfs_tmp: /dev
 ssh_deletekeys:   1
-ssh_genkeytypes:  ~
+ssh_genkeytypes:  ['rsa', 'ecdsa', 'ed25519']
 syslog_fix_perms: ~
 disable_vmware_customization: false

@@ -54,7 +54,7 @@

 system_info:
   default_user:
-    name: centos
+    name: cloud-user
     lock_passwd: true
     gecos: Cloud User
     groups: [adm, systemd-journal]

This is because cloud-init was actually updated, as a part of the disk prep step to make sure it is installed:

https://github.com/csmart/ansible-role-virt-infra/blob/master/tasks/disk-create.yml#L202

Therefore, we need to either get smarter about installing cloud-init not install the latest version of it (but it's probably good to install the latest version), or we need to make sure that the config wasn't changed since the initial RPM was installed (so that the config is overwritten and .rpmnew is never created), or we need to add a post install task to run commands in the disk (so that users can manage rpmconf)... Something like that.

csmart commented 2 years ago

Note that this can be worked around by overriding the virt_infra_guest_deps variable in the inventory and removing cloud-init (as it's already installed).

     virt_infra_guest_deps:                                                       
       - qemu-guest-agent                                                         
csmart commented 2 years ago

ahhh it's a bug in the image (because the cloud config was changed), so if you just update and use this latest image you should be fine: https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-20220125.1.x86_64.qcow2

Might still be handy to have a post install task though...

And while we're at it, the install task should only run if the length of virt_infra_guest_deps is not zero, this way users can actually skip the install step