Open shaneholder opened 1 year ago
Adding the fstab in question.
# CLOUD_IMG: This file was created/modified by the Cloud Image build process
UUID=10657b0e-bcdb-4859-aeaf-31f0fb6dbde6 / ext4 defaults,discard 0 1
UUID=CD18-78B8 /boot/efi vfat umask=0077 0 1
/dev/disk/azure/scsi1/lun0-part1 /alt_home auto defaults,nofail,comment=cloudconfig 0 2
/dev/disk/azure/scsi1/lun0-part2 /alt_tmp auto defaults,nofail,comment=cloudconfig 0 2
/dev/disk/azure/scsi1/lun0-part3 /alt_var auto defaults,nofail,comment=cloudconfig 0 2
/dev/disk/azure/scsi1/lun0-part4 /alt_var/log auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2
/dev/disk/azure/scsi1/lun0-part5 /alt_var/log/audit auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2
/dev/disk/azure/scsi1/lun0-part6 /alt_var/tmp auto defaults,nofail,x-mount.mkdir,comment=cloudconfig 0 2
/dev/disk/cloud/azure_resource-part1 /mnt auto defaults,nofail,x-systemd.requires=cloud-init.service,_netdev,comment=cloudconfig 0 2
And the 100_golden.cfg referenced in the packer instructions. This is only expected to be used when a VM is created using this golden image.
#cloud-config
mounts:
- [ "/dev/disk/azure/scsi1/lun0-part1", "/home", "auto", "defaults,nofail", "0", "2" ]
- [ "/dev/disk/azure/scsi1/lun0-part2", "/tmp", "auto", "defaults,nofail", "0", "2" ]
- [ "/dev/disk/azure/scsi1/lun0-part3", "/var", "auto", "defaults,nofail", "0", "2" ]
- [ "/dev/disk/azure/scsi1/lun0-part4", "/var/log", "auto", "defaults,nofail", "0", "2" ]
- [ "/dev/disk/azure/scsi1/lun0-part5", "/var/log/audit", "auto", "defaults,nofail", "0", "2" ]
- [ "/dev/disk/azure/scsi1/lun0-part6", "/var/tmp", "auto", "defaults,nofail", "0", "2" ]
you could touch /etc/cloud/cloud-init.disabled
before packer reboot, and remove it afterwards
you could
touch /etc/cloud/cloud-init.disabled
before packer reboot, and remove it afterwards
Thanks for the idea but cloud init still ran after the reboot..srsly odd.
Thanks for the idea but cloud init still ran after the reboot..srsly odd.
That's weird. Are we sure the file is being created before rebooting? Could you please attach the logs of your run touching cloud-init.disabled
?
I believe the reason for cloud-init fully running after the reboot is because the packer config mounts an alternative var partition, losing the original /var/lib/cloud/, which contains the cloud-init state of the first run.
If we want cloud-init to run modules that run on every boot and, we should transfer somehow /alt_var/lib/cloud/
to /var/lib/cloud
.
Another option would be to have two config files. 100_golden.cfg
as-is and a supplemental 101_final.cfg
containing the mounts
config without alt
., then remove 100_golden.cfg
before reboot using packer or runcmd, please note that without restoring /var/lib/cloud
, cloud-init would treat this as a first boot.
In any case, I believe this is not a cloud-init issue.
@aciba90 thanks for the reply. I assume you're referring to my var manipulation in packer. I tried several configurations even one where i don't manipulate /var thinking something along the same lines as you but I still get the same behavior. While working this weekend I also discovered that I needed to wait for cloud init to complete using cloud-init status --wait
. Even with that it seems that cloud-init is re-running. I've attached new logs and updated configuration files here.
Note the line in the cloud-init.log 2023-08-21 13:38:55,813 - cc_mounts.py[DEBUG]: Changes to fstab: ['- /dev/disk/azure/scsi1/lun0-part1 /home auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part1 /alt_home auto defaults,nofail,comment=cloudconfig 0 2']
Packer config
locals {
scripts_folder = "${path.cwd}/Linux/scripts"
compliance_folder = "${path.cwd}/compliance/ubuntu2004"
}
source "azure-arm" "linux-image" {
azure_tags = {
OS = var.OS
Version = var.OS_version
}
use_azure_cli_auth = true
image_offer = var.image_offer
image_publisher = var.image_publisher
image_sku = var.image_sku
shared_image_gallery_destination {
subscription = var.subscription_id
resource_group = var.sig_rg_name
gallery_name = var.gallery_name
image_name = var.image_name
image_version = var.image_version
replication_regions = var.replication_regions
storage_account_type = var.storage_account_type
}
build_resource_group_name = var.packer_rg_name
virtual_network_name = var.vnet_name
virtual_network_subnet_name = var.subnet_name
virtual_network_resource_group_name = var.vnet_rg_name
private_virtual_network_with_public_ip = true
managed_image_name = "${var.image_name}-${var.image_version}"
managed_image_resource_group_name = var.image_rg_name
os_type = var.os_type
vm_size = var.vm_size
disk_additional_size = [35] #/home(5) /tmp(5) /var(10) /var/log(5) /var/log/audit(5) /var/tmp(5)
custom_data_file = "${local.scripts_folder}/cloud.cfg"
ssh_private_key_file = "~/.ssh/id_rsa"
}
build {
sources = ["source.azure-arm.linux-image"]
# Wait for cloud init to complete
provisioner "shell" {
execute_command = "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'"
inline = [
"cloud-init status --wait"
]
}
# copy var to /alt_var
provisioner "shell" {
remote_folder = "/home/packer"
execute_command = "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'"
inline = [
"cd /home/; tar --selinux --acls --xattrs -cf - . | (cd /alt_home; tar xf -)"
]
}
# reboot vm to mount /var
provisioner "shell" {
remote_folder = "/home/packer"
inline = [
"sudo sed -i 's#/alt_#/#' /etc/fstab",
]
}
provisioner "breakpoint" {
disable = false
note = "reboot next"
}
# reboot vm to mount /var
provisioner "shell" {
remote_folder = "/home/packer"
inline = [
"sudo reboot"
]
expect_disconnect = true
pause_after = "60s"
}
provisioner "breakpoint" {
disable = false
note = "post reboot"
}
}
cloud-init config
#cloud-config
disk_setup:
/dev/disk/azure/scsi1/lun0:
table_type: gpt
layout:
- 15
- 14
- 28
- 14
- 14
- 14
overwrite: True
fs_setup:
- device: /dev/disk/azure/scsi1/lun0
partition: 1
filesystem: ext4
- device: /dev/disk/azure/scsi1/lun0
partition: 2
filesystem: ext4
- device: /dev/disk/azure/scsi1/lun0
partition: 3
filesystem: ext4
- device: /dev/disk/azure/scsi1/lun0
partition: 4
filesystem: ext4
- device: /dev/disk/azure/scsi1/lun0
partition: 5
filesystem: ext4
- device: /dev/disk/azure/scsi1/lun0
partition: 6
filesystem: ext4
mounts:
- [ "/dev/disk/azure/scsi1/lun0-part1", "/alt_home", "auto", "defaults,nofail", "0", "2" ]
package_update: true
packages:
- vim
- less
In your packer script, could you please add "touch /etc/cloud/cloud-init.disabled"
[1] after "cloud-init status --wait"
?
That will disable cloud-init running on the second boot. And you will see a line with etc_file found disabled
in /run/cloud-init/cloud-init-generator.log
. (That line is not present in your logs.)
Doing that, the created image will never run cloud-init again, until the disable file is removed, and then your problem will reappear.
If one wants images where cloud-init runs, I think the user-data for the final instance must be modified before reboot.
@aciba90 thanks I will give that a try again, FWIW this is on Ubuntu and I thought that I saw code that only used that file for RHEL but I could be mistaken. While I'm trying that, any thoughts as to why cloud-init is running after the reboot in the first place?
Hi, same problem cloud init re-runs on the reboot. I am not expecting the cloud-init execution at 19:52.
root@pkrvmt1dezfw0mb:/var/log# grep lun0-part1 cloud-init.log
2023-08-28 19:50:54,523 - cc_mounts.py[DEBUG]: mounts configuration is [['/dev/disk/azure/scsi1/lun0-part1', '/alt_home', 'auto', 'defaults,nofail', '0', '2']]
2023-08-28 19:50:54,524 - cc_mounts.py[DEBUG]: Attempting to determine the real name of /dev/disk/azure/scsi1/lun0-part1
2023-08-28 19:50:54,524 - cc_mounts.py[DEBUG]: changed /dev/disk/azure/scsi1/lun0-part1 => None
2023-08-28 19:50:54,524 - cc_mounts.py[DEBUG]: Ignoring nonexistent named mount /dev/disk/azure/scsi1/lun0-part1
2023-08-28 19:50:54,526 - cc_mounts.py[DEBUG]: Changes to fstab: ['+ /dev/disk/azure/scsi1/lun0-part1 /alt_home auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/cloud/azure_resource-part1 /mnt auto defaults,nofail,x-systemd.requires=cloud-init.service,_netdev,comment=cloudconfig 0 2']
2023-08-28 19:52:43,840 - cc_mounts.py[DEBUG]: mounts configuration is [['/dev/disk/azure/scsi1/lun0-part1', '/alt_home', 'auto', 'defaults,nofail', '0', '2']]
2023-08-28 19:52:43,841 - cc_mounts.py[DEBUG]: Attempting to determine the real name of /dev/disk/azure/scsi1/lun0-part1
2023-08-28 19:52:43,841 - cc_mounts.py[DEBUG]: changed /dev/disk/azure/scsi1/lun0-part1 => None
2023-08-28 19:52:43,841 - cc_mounts.py[DEBUG]: Ignoring nonexistent named mount /dev/disk/azure/scsi1/lun0-part1
2023-08-28 19:52:43,842 - cc_mounts.py[DEBUG]: Changes to fstab: ['- /dev/disk/azure/scsi1/lun0-part1 /home auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part1 /alt_home auto defaults,nofail,comment=cloudconfig 0 2']
@aciba90 thanks I will give that a try again, FWIW this is on Ubuntu and I thought that I saw code that only used that file for RHEL but I could be mistaken.
The existence of this file disables cloud-init in all of the systemd service files, not just a rhel thing.
While I'm trying that, any thoughts as to why cloud-init is running after the reboot in the first place?
something is starting cloud-init? I see that init and init-local stages are re-run, but not sure why. If systemd isn't observing this file in your image, then you have a systemd bug. If something else is starting cloud-init externally, you'll need to figure out what that is. Either way, I agree with @aciba90's analysis - I don't see a cloud-init bug here.
Aside: You are providing invalid cloud-config to cloud-init, and there are other warnings in your logs so cloud-init is likely not doing what you want it to. For example, mount -a fails due to unknown filesystem type ntfs.
Ok, I'll look into that. It's odd because we're using the Canonical minimal image in Azure.
image_publisher = "Canonical" image_offer = "0001-com-ubuntu-minimal-focal" image_sku = "minimal-20_04-lts-gen2"
And looking at the file you pointed out is where I got the idea that cloud-init.disabled was for rhel only. Line 29 is a conditional for variant=="rhel" so I assumed that was RedHat.
Ok, I'll look into that. It's odd because we're using the Canonical minimal image in Azure.
image_publisher = "Canonical" image_offer = "0001-com-ubuntu-minimal-focal" image_sku = "minimal-20_04-lts-gen2"
And looking at the file you pointed out is where I got the idea that cloud-init.disabled was for rhel only. Line 29 is a conditional for variant=="rhel" so I assumed that was RedHat.
Indeed it is, the current version of the template only emits that for 'rhel' distros. That's OK. What actually controls whether cloud-init runs or not is the systemd-generator[1] along with the ds-identify[2] tool, which searches for cloud-init policy[3] to determine what to do if any cloud-init datasources are found.
The log file run/cloud-init/ds-identify.log
from your debug attachment shows
that the policy inside the Ubuntu image you used was this:
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
/etc/cloud/cloud.cfg.d/90_dpkg.cfg set datasource_list: [ Azure ]
...
ON_FOUND=all
ON_MAYBE=all
ON_NOTFOUND=disabled
...
single entry in datasource_list (Azure) use that.
In particular, mode=search,found=all
in the policy means to search for
datasources, and run any datasources that are found; and in this case, it
found Azure datasource and so by policy, it enables cloud-init; which means
cloud-init.target is wanted by the default system target and cloud-init will
always run, independent of what's in /etc/cloud-init/*
To resolve your issue of disabling cloud-init on Azure, you should be able to write out a disabled policy file like so:
cat > /etc/cloud/ds-identify.cfg <<EOF
policy: disabled,found=first,maybe=none,notfound=disabled
EOF
This will prevent cloud-init.target from being added to systemd's default target and none of the cloud-init service unit will be run.
I don't have an azure instance right now; but testing on a LXD system that booted via cloud-init, adding that file results in the following:
% lxc exec j1 /bin/bash
root@j1:~# lsb_release -rd
Description: Ubuntu 22.04.1 LTS
Release: 22.04
root@j1:~# systemctl status cloud-init.target
○ cloud-init.target - Cloud-init target
Loaded: loaded (/lib/systemd/system/cloud-init.target; static)
Active: inactive (dead)
root@j1:~# cat /run/cloud-init/ds-identify.log
[up 0.25s] ds-identify
policy loaded: mode=disabled report=false found=first maybe=none notfound=disabled
/etc/cloud/cloud.cfg.d/90_dpkg.cfg set datasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, Hetzner, IBMCloud, Oracle, Exoscale, RbxCloud, UpCloud, VMware, Vultr, LXD, NWCS, None ]
DMI_PRODUCT_NAME=System Product Name
DMI_SYS_VENDOR=ASUS
DMI_PRODUCT_SERIAL=unavailable
DMI_PRODUCT_UUID=unavailable
PID_1_PRODUCT_NAME=unavailable
DMI_CHASSIS_ASSET_TAG=Default string
DMI_BOARD_NAME=PRIME TRX40-PRO
FS_LABELS=unavailable:container
ISO9660_DEVS=unavailable:container
KERNEL_CMDLINE=/sbin/init
VIRT=lxc
UNAME_KERNEL_NAME=Linux
UNAME_KERNEL_RELEASE=5.15.0-78-generic
UNAME_KERNEL_VERSION=#85~20.04.1-Ubuntu SMP Mon Jul 17 09:42:39 UTC 2023
UNAME_MACHINE=x86_64
UNAME_NODENAME=j1
UNAME_OPERATING_SYSTEM=GNU/Linux
DSNAME=
DSLIST=NoCloud ConfigDrive OpenNebula DigitalOcean Azure AltCloud OVF MAAS GCE OpenStack CloudSigma SmartOS Bigstep Scaleway AliYun Ec2 CloudStack Hetzner IBMCloud Oracle Exoscale RbxCloud UpCloud VMware Vultr LXD NWCS None
MODE=disabled
ON_FOUND=first
ON_MAYBE=none
ON_NOTFOUND=disabled
pid=45 ppid=24
is_container=true
mode=disabled. returning 1
[up 0.28s] returning 1
root@j1:~# cloud-init status --long
status: disabled
boot_status_code: disabled-by-generator
detail:
Cloud-init disabled by cloud-init-generator
You can simply remove this policy file to re-enable cloud-init in that image.
I'm starting to think cloud-init just doesn't like me. :) Still not the results I'm expecting.
ds-identify.cfg file
packer@pkrvm81fdkczhiy:/var/log$ cat /etc/cloud/ds-identify.cfg
policy: disabled,found=first,maybe=none,notfound=disabled
check cloud-init status
packer@pkrvm81fdkczhiy:/var/log$ cloud-init status --long
status: running
boot_status_code: disabled-by-generator
last_update: Wed, 30 Aug 2023 21:22:33 +0000
detail:
DataSourceAzure [seed=/var/lib/waagent]
grep for cloud-init modifying fstab the change at 21:22:33,556 is unexpected
packer@pkrvm81fdkczhiy:/var/log$ grep alt_home cloud-init.log
2023-08-30 21:20:47,722 - cc_mounts.py[DEBUG]: mounts configuration is [['/dev/disk/azure/scsi1/lun0-part1', '/alt_home', 'auto', 'defaults,nofail', '0', '2']]
2023-08-30 21:20:47,725 - cc_mounts.py[DEBUG]: Changes to fstab: ['+ /dev/disk/azure/scsi1/lun0-part1 /alt_home auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/cloud/azure_resource-part1 /mnt auto defaults,nofail,x-systemd.requires=cloud-init.service,_netdev,comment=cloudconfig 0 2']
2023-08-30 21:22:33,554 - cc_mounts.py[DEBUG]: mounts configuration is [['/dev/disk/azure/scsi1/lun0-part1', '/alt_home', 'auto', 'defaults,nofail', '0', '2']]
2023-08-30 21:22:33,556 - cc_mounts.py[DEBUG]: Changes to fstab: ['- /dev/disk/azure/scsi1/lun0-part1 /home auto defaults,nofail,comment=cloudconfig 0 2', '+ /dev/disk/azure/scsi1/lun0-part1 /alt_home auto defaults,nofail,comment=cloudconfig 0 2']
verify that fstab was modified back to /alt_home
packer@pkrvm81fdkczhiy:/var/log$ grep home /etc/fstab
/dev/disk/azure/scsi1/lun0-part1 /alt_home auto defaults,nofail,comment=cloudconfig 0 2
packer@pkrvm81fdkczhiy:/var/log$
An additional oddity that I noticed is that it seems that cloud-init status --wait never exits after the reboot.
tail the log file
2023-08-30 21:22:33,914 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2023-08-30 21:22:33,915 - util.py[DEBUG]: Read 10 bytes from /proc/uptime
2023-08-30 21:22:33,915 - util.py[DEBUG]: cloud-init mode 'init' took 1.303 seconds (1.30)
2023-08-30 21:22:33,915 - handlers.py[DEBUG]: finish: init-network: SUCCESS: searching for network datasources
2023-08-30 21:22:33,915 - handlers.py[DEBUG]: HyperVReportingHandler flushing remaining events
print current date time
packer@pkrvm81fdkczhiy:/var/log$ date
Wed Aug 30 21:33:18 UTC 2023
wait some more
packer@pkrvm81fdkczhiy:/var/log$ cloud-init status --wait
.........................................................................................................................................................................................................................................................................................................^CFailed to write to /dev/console
Cloud-init 23.2.2-0ubuntu0~20.04.1 received SIGINT, exiting...
Filename: /usr/lib/python3/dist-packages/cloudinit/cmd/status.py
Function: handle_status_args
Line number: 127
Filename: /usr/lib/python3/dist-packages/cloudinit/util.py
Function: log_time
Line number: 2745
Filename: /usr/lib/python3/dist-packages/cloudinit/cmd/main.py
Function: main
Line number: 1080
tail the log again
packer@pkrvm81fdkczhiy:/var/log$ tail -f cloud-init.log
2023-08-30 21:22:33,913 - handlers.py[DEBUG]: start: init-network/config-ssh: running config-ssh with frequency once-per-instance
2023-08-30 21:22:33,913 - helpers.py[DEBUG]: config-ssh already ran (freq=once-per-instance)
2023-08-30 21:22:33,913 - handlers.py[DEBUG]: finish: init-network/config-ssh: SUCCESS: config-ssh previously ran
2023-08-30 21:22:33,913 - main.py[DEBUG]: Ran 10 modules with 0 failures
2023-08-30 21:22:33,914 - atomic_helper.py[DEBUG]: Atomically writing to file /var/lib/cloud/data/status.json (via temporary file /var/lib/cloud/data/tmp4z0dj5lo) - w: [644] 540 bytes/chars
2023-08-30 21:22:33,914 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2023-08-30 21:22:33,915 - util.py[DEBUG]: Read 10 bytes from /proc/uptime
2023-08-30 21:22:33,915 - util.py[DEBUG]: cloud-init mode 'init' took 1.303 seconds (1.30)
2023-08-30 21:22:33,915 - handlers.py[DEBUG]: finish: init-network: SUCCESS: searching for network datasources
2023-08-30 21:22:33,915 - handlers.py[DEBUG]: HyperVReportingHandler flushing remaining events
still no update to log file
packer@pkrvm81fdkczhiy:/var/log$ date
Wed Aug 30 21:34:52 UTC 2023
Thanks for testing.
$ cloud-init status --long status: running boot_status_code: disabled-by-generator last_update: Wed, 30 Aug 2023 21:22:33 +0000
This shows that cloud-init was successfully disabled and was not run by systemd. Whatever else is going on, cloud-init is not misbehaving here.
If cloud-init is running but was disabled by the generator, then something else besides systemd started the service. I believe someone else pointed out that some other tool may be invoking cloud-init commands directly.
If you want to find out what is doing that, you can move /usr/bin/cloud-init out of the way (rename it) ; and whatever script or service that's invoking cloud-init should break/stack-trace and you can find the origin of your pain.
A complete dump of: journalctl -b 0 -o short-precise
will be helpful to
packer support in diagnosing what is invoking cloud-init directly.
Thanks for the troubleshooting tips. I've done that and before I submit anything to the packer team I wanted a quick review of the below.
I'm using the same code as before but I added a sudo mv /usr/bin/cloud-init /usr/bin/cloud-not-init
right before I reboot the VM. the line @ 442136, is that in opposition to the statement above that "something else besides systemd started the service?"
Trying to make sure I'm understanding the expectations of the process.
packer@pkrvmgvzux857cz:~$ journalctl -b 0 -o short-precise | grep -i cloud-init
Sep 05 19:36:11.442095 pkrvmgvzux857cz systemd[452]: cloud-init-local.service: Failed to execute command: No such file or directory
Sep 05 19:36:11.442136 pkrvmgvzux857cz systemd[452]: cloud-init-local.service: Failed at step EXEC spawning /usr/bin/cloud-init: No such file or directory
Sep 05 19:36:11.442607 pkrvmgvzux857cz systemd[1]: Starting Initial cloud-init job (pre-networking)...
Sep 05 19:36:11.461169 pkrvmgvzux857cz systemd[1]: cloud-init-local.service: Main process exited, code=exited, status=203/EXEC
Sep 05 19:36:11.461321 pkrvmgvzux857cz systemd[1]: cloud-init-local.service: Failed with result 'exit-code'.
Sep 05 19:36:11.462046 pkrvmgvzux857cz systemd[1]: Failed to start Initial cloud-init job (pre-networking).
Sep 05 19:36:14.241921 pkrvmgvzux857cz systemd[1]: Starting Initial cloud-init job (metadata service crawler)...
Sep 05 19:36:14.245319 pkrvmgvzux857cz systemd[475]: cloud-init.service: Failed to execute command: No such file or directory
Sep 05 19:36:14.245363 pkrvmgvzux857cz systemd[475]: cloud-init.service: Failed at step EXEC spawning /usr/bin/cloud-init: No such file or directory
Sep 05 19:36:14.245828 pkrvmgvzux857cz systemd[1]: cloud-init.service: Main process exited, code=exited, status=203/EXEC
Sep 05 19:36:14.245959 pkrvmgvzux857cz systemd[1]: cloud-init.service: Failed with result 'exit-code'.
Sep 05 19:36:14.246326 pkrvmgvzux857cz systemd[1]: Failed to start Initial cloud-init job (metadata service crawler).
Sep 05 19:36:14.890478 pkrvmgvzux857cz python3[501]: 2023-09-05T19:36:14.889224Z INFO Daemon cloud-init is enabled: True
Sep 05 19:36:14.892794 pkrvmgvzux857cz python3[501]: 2023-09-05T19:36:14.891726Z INFO Daemon Using cloud-init for provisioning
Thanks for the troubleshooting tips. I've done that and before I submit anything to the packer team I wanted a quick review of the below.
I'm using the same code as before but I added a
sudo mv /usr/bin/cloud-init /usr/bin/cloud-not-init
right before I reboot the VM. the line @ 442136, is that in opposition to the statement above that "something else besides systemd started the service?"
I've not used packer, so I'm not sure what's happening during a packer VM execution, but from what I understand there are at least these 3 things happening (please correct me where I'm wrong)
1) first boot of some cloud-image on azure (is this unmodified Ubuntu or has packer already made changes?) 2) (1) reads the cloud-init configuration from the metadata service in azure and runs that code
3) at somepoint during (2) a reboot is issued.
Before the reboot I would
1) invoke cloud-init collect-logs
to capture current state of cloud-init
saving the tarball in some place you can get it after the reboot.
2) inject the ds-identify policy to disable cloud-init
3) move /usr/bin/cloud-init
out of the way
4) optionally enable verbose systemd debugging
(adding "systemd.log_level=debug systemd.log_target=console systemd.journald.forward_to_console=1"
to kernel cmdline)
After reboot
1) invoke cloud-init collect-logs
to capture the new state of the system.
Comparing the collected logs from before and after reboot should shed some light on what's going on.
systemctl status <unit>
on units cloud-init.target cloud-init-local.service cloud-init.serviceThis hopefully will show them as inactive, though it seem in your case they've been added to some systemd target.
systemd-analyze dot cloud-init.target
to show what services/targets "want" cloud-init.target
find /run/systemd/generator* -name 'cloud-init.target'
to see if a
generator enabled cloud-init
Trying to make sure I'm understanding the expectations of the process.
The process is mostly capturing enough information about the state the system is in currently to understand why systemd starts cloud-init even if cloud-init does not generate a link telling systemd to start it up.
The ds-identify activation from cloud-init results in creating a symlink here:
root@j1:/run/systemd/generator.early/multi-user.target.wants# ls -al
total 0
drwxr-xr-x 2 root root 60 Sep 5 21:47 .
drwxr-xr-x 3 root root 60 Sep 5 21:47 ..
lrwxrwxrwx 1 root root 37 Sep 5 21:47 cloud-init.target -> /lib/systemd/system/cloud-init.target
Lastly, if you cannot find out what is enabling cloud-init units, then the big hammer is to mask the units.
$ systemctl mask cloud-init.target cloud-init-local.service cloud-init.service cloud-config.service cloud-final.service
Ideally just .target is enough but masking all of them will ensure systemd will no execute them.
root@j1:~# systemctl mask cloud-init.target
Created symlink /etc/systemd/system/cloud-init.target → /dev/null.
root@j1:~# reboot
root@j1:~#
..:(╯°□°)╯彡┻━━┻
% lxc exec j1 /bin/bash
root@j1:~# cloud-init status --long
status: not run
boot_status_code: enabled-by-generator
detail:
Cloud-init enabled by systemd cloud-init-generator
root@j1:~# systemctl status cloud-init.target
○ cloud-init.target
Loaded: masked (Reason: Unit cloud-init.target is masked.)
Active: inactive (dead)
To undo, just run systemd unmask
on each unit you masked.
Bug report
I am attempting to use Packer to create a golden image that mounts /var as a partition. In the Packer build data is copied from several folders to the mounted alt_ folders and then a sed command is executed against fstab to replace the /alt_ with a blank to make the real mount points. The VM is then rebooted before continuing with the rest of the golden image configuration, however after the boot the changes made to fstab are replace by cloud-init. Log files are attached.
Packer build section
Cloud Config
Steps to reproduce the problem
Environment details
cloud-init logs
cloud-init.tar.gz