canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.9k stars 863 forks source link

Cloud init fails to invoke puppet during runcmd if command fails to run #4383

Open thatsk opened 1 year ago

thatsk commented 1 year ago

i am trying to run puppet on the node via runcmd offcourse first time puppet run fails so we tried to add puppet agent -tdv for two times but at first time it just ran, and goes in reboot state . Cloud-init 22.1-6.e18_7.2 received SIGTERM, exiting... Filename: /usr/1ib64/python3.6/subprocess.py Function: _try wait Line number: 1424 Filename: /usr/1ib64/python3.6/subprocess.py Function: wait Line number: 1477 Filename: /usr/1ib64/python3.6/subprocess.py Function: communicate Line number: 855

CalvoM commented 1 year ago

Thank you @thatsk for raising this issue. Please use this format to report the bug so that we can reproduce the issue and even know your environment. I will mark this as incomplete due to the missing details but I will change to new when you update the details.

blackboxsw commented 1 year ago

Additionally, /etc/cloud/cloud.cfg defines a list of modules and the order in which those modules run. The puppet module runs in the cloud-init final boot stage, which is after the runcmd which lives in the cloud config boot stage. More info on boot stages. You can see the order that modules have run on your system with cloud-init analyze show. The output breaks them down into the 4 different boot stages so you get a better visual of when puppet is run vs runcmd. So trying to invoke puppet commands during runcmd at this point in is not currently possible with cloud-init. If there

Here's a snippet of /etc/cloud/cloud.cfg that shows that puppet runs later than runcmd due to config.

cloud_config_modules:
  - wireguard
  - snap
  - ubuntu_autoinstall
  - ssh_import_id
  - keyboard
  - locale
  - set_passwords
  - grub_dpkg
  - apt_pipelining
  - apt_configure
  - ubuntu_advantage
  - ntp
  - timezone
  - disable_ec2_metadata
  - runcmd                                <---- runcmd is executed in modules:config stage before any cloud_final_modules
  - byobu

# The modules that run in the 'final' stage
cloud_final_modules:
  - package_update_upgrade_install
  - fan
  - landscape
  - lxd
  - ubuntu_drivers
  - write_files_deferred
  - puppet                  <---- puppet config and setup only runs here
  - chef
  - ansible
  - mcollective
  - salt_minion
...

While under-documented, It is possible to override the default config module ordering in the userdata #cloud-config that you provide at install time.

One possibility is providing the full list of cloud_final_modules and add a 'runcmd' item after '- puppet' to ensure runcmd module is run in both cloud_config_modules and cloud_final_modules. your runcmd script could be smart enough to check if puppet exists yet or not so it'd NOOP during cloud_config_modules stage.... Alternatively you could provide a full list of cloud_config_modules in user-data that excludes only the '- runcmd' item to ensure runcmd is only run during cloud_final_modules.

Your #cloud-config could look something like this

#cloud-config
cloud_config_modules:
 - wireguard
 - snap
 - ubuntu_autoinstall
 - ssh_import_id
 - ....    # Make sure to exclude `runcmd` from this full list
 - keyboard
 - locale
 - set_passwords
 - ...
cloud_final_modules:
 - package_update_upgrade_install
 - fan
 - landscape
 - lxd
 - ubuntu_drivers
 - write_files_deferred
 - puppet
 - runcmd                      # included runcmd after puppet in cloud_final 
 - scripts_per_instance
 - scripts_per_boot

puppet:
 <YOUR_PUPPET_CFG>

runcmd: 
  - <YOUR_RUNCMD_SCRIPT>
blackboxsw commented 1 year ago

Additionally as @CalvoM mentioned, getting ahold of your logs from cloud-init collect-logs would help us better understand the context of the failure. Note though that you may want to look at the collect-logs.tar.gz:var/run/cloud-init/instance-data-sensitive.json to ensure no senstiive information or passwords are provided by your user-data that need to be manually redacted.