ComputeCanada / puppet-magic_castle

Puppet Environment repo for Magic Castle - https://github.com/ComputeCanada/magic_castle
MIT License
13 stars 21 forks source link

portal-12 error #250

Closed poquirion closed 1 year ago

poquirion commented 1 year ago

On mgmt node:

cd /etc/puppetlabs/code/environments/production/site/
git log
commit e849e100c99a033e86e033836406d69e69b4972c (HEAD -> portal-12, origin/portal-12)
Author: Félix-Antoine Fortin <felix-antoine.fortin@calculquebec.ca>
Date:   Thu Apr 6 11:56:54 2023 -0400

    Comment podman commands

I deployed and I get:

 journalctl -u puppet
 Logs begin at Wed 2023-07-05 19:39:35 UTC, end at Thu 2023-07-06 21:40:29 UTC. --
Jul 05 19:40:24 mgmt1 systemd[1]: Started Puppet agent.
Jul 05 19:40:28 mgmt1 puppet-agent[1104]: Starting Puppet client version 6.23.0
Jul 05 19:40:28 mgmt1 puppet-agent[1109]: (/File[/opt/puppetlabs/puppet/cache/facts.d/cpu_ext.sh]/ensure) defined content as '{md5}072cec4f039d0e567b2ee0ee5d89b4cc'
Jul 05 19:40:28 mgmt1 puppet-agent[1109]: (/File[/opt/puppetlabs/puppet/cache/facts.d/dev_disk.sh]/ensure) defined content as '{md5}5979e6f9bbeca7617394a2887e40527d'
Jul 05 19:40:28 mgmt1 puppet-agent[1109]: (/File[/opt/puppetlabs/puppet/cache/facts.d/letsencrypt.sh]/ensure) defined content as '{md5}1ccd1d6b8d7062f328d843b200476b84'
Jul 05 19:40:28 mgmt1 puppet-agent[1109]: (/File[/opt/puppetlabs/puppet/cache/facts.d/nvidia_gpu_count.sh]/ensure) defined content as '{md5}16a2f199f0bc7b1a91926fd20feafa2e'
Jul 05 19:40:28 mgmt1 puppet-agent[1109]: (/File[/opt/puppetlabs/puppet/cache/facts.d/nvidia_grid_vgpu.sh]/ensure) defined content as '{md5}636b799d5416a4f50f9ce27aa08cdf2f'
Jul 05 19:40:29 mgmt1 puppet-agent[1109]: (/File[/opt/puppetlabs/puppet/cache/facts.d/terraform_facts.yaml]/ensure) defined content as '{md5}ee722643610b434e04a1ea4830d7f3b5'
Jul 05 19:40:29 mgmt1 puppet-agent[1109]: (/File[/opt/puppetlabs/puppet/cache/facts.d/uid_max.sh]/ensure) defined content as '{md5}94777725b8bbc1ee687b1d733dec33af'
Jul 05 19:40:29 mgmt1 puppet-agent[1109]: (/File[/opt/puppetlabs/puppet/cache/lib/facter]/ensure) created
Jul 05 19:40:29 mgmt1 puppet-agent[1109]: (/File[/opt/puppetlabs/puppet/cache/lib/facter/nameservers.rb]/ensure) defined content as '{md5}85483e98c715fd90b2ecb2686bbe526a'
Jul 05 19:40:31 mgmt1 puppet-agent[1109]:  Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Could not find declared class consul (file: /etc/puppetlabs/code/environments/production/site/profile/manifests/consul.pp, line: 6, column: 3) on node mgmt1
Jul 05 19:40:31 mgmt1 puppet-agent[1109]: Could not retrieve catalog; skipping run
[repeat the last two lines for ever]

seems that it did not like the rebase.

poquirion commented 1 year ago

It is not happy about line six of site/profile/manifests/consul.pp...

1 class profile::consul::server {
      2   $interface = profile::getlocalinterface()
      3   $ipaddress = $facts['networking']['interfaces'][$interface]['ip']
      4   $consul_servers = lookup('profile::consul::client::servers', undef, undef, [$ipaddress])
      5 
      6   class { 'consul':
      7     config_mode   => '0640',
      8     acl_api_token => lookup('profile::consul::acl_api_token'),
      9     config_hash   => {
     10       'bootstrap_expect' => length($consul_servers),
cmd-ntrf commented 1 year ago

Could not find declared class consul Typically, this means librarian-puppet was unable to install the consul module under: /etc/puppetlabs/code/environment/production/modules

The installation of the module is unrelated to portal-12 branch and the log of the installation can be found in /var/log/cloud-init-output.log.

Can you verify consul module is missing from the modules folder and check in the log what has happened if it is the case?

poquirion commented 1 year ago

The /etc/puppetlabs/code/environment/production/modules folder has not even been created. I was trying to debug and just assume that it was a deployement without a module folder! So I was wrong:

here is the problem:

[root@mgmt1 production]# grep -A 5 ERROR /var/log/cloud-init-output.log
ERROR:  Error installing librarian-puppet:
    The last version of faraday-net_http (< 3.1, >= 2.0) to support your Ruby & RubyGems was 2.1.0. Try installing it with `gem install faraday-net_http -v 2.1.0` and then running the current command again
    faraday-net_http requires Ruby version >= 2.6.0. The current ruby version is 2.5.0.
cmd-ntrf commented 1 year ago

This should have been fixed in release 12.5.0. Which version are you using?

poquirion commented 1 year ago

The rebased portal-12 branch. It must be a regression

cmd-ntrf commented 1 year ago

The change is not in the puppet code, but in the Terraform / cloud-init code. So you need to upgrade the ref of your main.tf.

poquirion commented 1 year ago

I did not realized that terraform init was not enough. I ran terraform get -update and it solved my problem.