ComputeCanada / magic_castle

Terraform modules to replicate the HPC user experience in the cloud
MIT License
124 stars 36 forks source link

Puppet Configuration Issues with Magic Castle on Azure using AlmaLinux 9.4 #317

Closed odiezg closed 1 month ago

odiezg commented 1 month ago

Environment

Issue Description

When deploying Magic Castle on Azure with AlmaLinux 9.4, Puppet fails to configure services correctly. Attempts to manually start Puppet agent tests result in connection failures to the Puppet master server and command not found errors.

Symptoms

  1. Puppet Agent Command Not Found Error:

centos@login1 ~]$ sudo puppet agent --test sudo: puppet: command not found

markdown Copy code

  1. Connection Failure to Puppet Master: [centos@login1 ~]$ puppet agent --test Error: Connection to https://puppet:8140/puppet-ca/v1 failed, trying next route: Request to https://puppet:8140/puppet-ca/v1 failed after 0.018 seconds: Failed to open TCP connection to puppet:8140 (getaddrinfo: Name or service not known) Wrapped exception: Failed to open TCP connection to puppet:8140 (getaddrinfo: Name or service not known) Error: No more routes to ca

    Configuration File Snippet

    
    module "azure" {
    source         = "./azure"
    config_git_url = "https://github.com/ComputeCanada/puppet-magic_castle.git"
    config_version = "13.5.0"
    cluster_name   = "hpcie"
    domain         = "labs.faculty.ie.edu"
    image          = {
    publisher = "almalinux",
    offer     = "almalinux-x86_64",
    sku       = "9-gen2",
    version   = "9.4.2024050902"
    }
    instances = {
    mgmt  = { type = "Standard_DS2_v2",  count = 1, tags = ["mgmt", "puppet", "nfs"] },
    login = { type = "Standard_DS1_v2", count = 1, tags = ["login", "public", "proxy"] },
    node  = { type = "Standard_DS1_v2",  count = 2, tags = ["node"] }
    }
    volumes = {
    nfs = {
    home     = { size = 10 }
    project  = { size = 50 }
    scratch  = { size = 50 }
    }
    }
    public_keys = [file("~/.ssh/id_rsa.pub")]
    }
cmd-ntrf commented 1 month ago

Based on the information you provided, Puppet was not installed at all. The error most likely happens during the cloud-init phase.

Could you provide the cloud-init logs available under : /var/log/cloud-init-output.log

odiezg commented 1 month ago

Dera Felix, many thanks for your email, I attach the log files from the mgmt1 server and the main.tf config file I have used to install it.

I see this error:

Created symlink /etc/systemd/system/multi-user.target.wants/puppetserver.service → /usr/lib/systemd/system/puppetserver.service. ERROR: Error installing puppet_forge:

It seems as if there is an issue with Ruby installation. I could try to install it directly, but I am not sure how to avoid this in the future installations and also what else do I need to do if I install it manually to launch the rest of the installation process.

sudo yum install ruby //install Ruby and the gem command. gem install minitar -v 0.12 //install minitar gem install puppet_forge //reinstall puppet

Please let me know if you need any information.

Many thanks in advance, Oscar

On Mon, 19 Aug 2024 at 15:44, Félix-Antoine Fortin @.***> wrote:

Based on the information you provided, Puppet was not installed at all. The error most likely happens during the cloud-init phase.

Could you provide the cloud-init logs available under : /var/log/cloud-init-output.log

— Reply to this email directly, view it on GitHub https://github.com/ComputeCanada/magic_castle/issues/317#issuecomment-2296616868, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2TI5EQ3FLNRJUMHD3UPZJDZSHZFFAVCNFSM6AAAAABMKPTAAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJWGYYTMOBWHA . You are receiving this because you authored the thread.Message ID: @.***>

cmd-ntrf commented 1 month ago

Hi Oscar,

I could reproduce the error you quoted. While unfortunate, it is not fatal to the installation and configuration of the Puppet server. Could you provide the rest of the cloud-init log from mgmt1? Ideally as a gist.github.com to help with formatting.

I also realized that I have incorrectly read your first message.

  1. puppet command is not found with sudo because /opt/puppetlabs/puppet/bin is not included in sudo PATH, not because Puppet is not installed. Given you were able to run the command as the user centos, we will assume Puppet was properly installed.
  2. Have you look at the puppet log on login1 with : journalctl -u puppet. The error you mention happens at the beginning of the agent run because the server is yet up, but eventually once the server is configured, the agent can connect. There is typically no reason to run the Puppet agent manually via the command-line in Magic Castle.
odiezg commented 1 month ago

Dear Feliz, many thanks for your swift response. I have included the files you requested here: https://gist.github.com/odiezg/edf627a63d28cc52de6bd44b4702a256

Regarding the installation with puppet, you are right, it is regarding the PATH. But the main issue is the the installation is not finished properly and I cannot not connect to the cluster, especially jupyter.

Many thanks again for your help and please let me know if you need anything else.

Regards, Oscar

On Tue, 20 Aug 2024 at 17:37, Félix-Antoine Fortin @.***> wrote:

Hi Oscar,

I could reproduce the error you quoted. While unfortunate, it is not fatal to the installation and configuration of the Puppet server. Could you provide the rest of the cloud-init log from mgmt1? Ideally as a gist.github.com to help with formatting.

I also realized that I have incorrectly read your first message.

  1. puppet command is not found with sudo because /opt/puppetlabs/puppet/bin is not included in sudo PATH, not because Puppet is not installed. Given you were able to run the command as the user centos, we will assume Puppet was properly installed.
  2. Have you look at the puppet log on login1 with : journalctl -u puppet. The error you mention happens at the beginning of the agent run because the server is yet up, but eventually once the server is configured, the agent can connect. There is typically no reason to run the Puppet agent manually via the command-line in Magic Castle.

— Reply to this email directly, view it on GitHub https://github.com/ComputeCanada/magic_castle/issues/317#issuecomment-2299157306, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2TI5EVBIYPS7ZZHAAZUJUTZSNPCNAVCNFSM6AAAAABMKPTAAKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJZGE2TOMZQGY . You are receiving this because you authored the thread.Message ID: @.***>