canonical / maas-ansible-playbook

An Ansible playbook for installing and configuring MAAS
Apache License 2.0
48 stars 36 forks source link

HA Postgresql fails to initiate #121

Closed ben-ihelputech closed 1 year ago

ben-ihelputech commented 1 year ago

Problem

When defining multiple PostgreSQL hosts, the database fails to complete the setup. This seems to be due to an issue with Corosync and pacemaker not getting installed on all of the database hosts.

Setup

I have three physical LXD hosts, one amd64 (lxd-01) and two arm64 (lxd-02, lxd-03), clustered together. I have a custom script that deploys identical LXD containers based on a yaml file. They are connected to an unmanaged bridge and using cloud-init to setup networking. Connectivity has been confirmed between the containers.

containers:
  - name: maas
    ip: 10.30.102.10
    target: lxd-01
    hostname: maas.example.com
  - name: maas-db01
    ip: 10.30.102.11
    target: lxd-01
    privileged: true
    hostname: maas-db01.example.com
  - name: maas-db02
    ip: 10.30.102.12
    target: lxd-02
    privileged: true
    hostname: maas-db02.example.com
  - name: maas-db03
    ip: 10.30.102.13
    target: lxd-03
    privileged: true
    hostname: maas-db03.example.com

Here is the inventory file I'm using:

---
all:
  vars:
    ansible_user: ben
    # maas_pacemaker_fencing_driver: "ipmilan"
    maas_version: "3.3"
    maas_postgres_password: "MyPassword"
    # maas_postgres_floating_ip:
    maas_installation_type: "snap"
    maas_url: "http://maas.example.com:5240/MAAS"
  children:
    maas_postgres:
      hosts:
        maas-db01.example.com:
        maas-db02.example.com:
        maas-db03.example.com:
    maas_pacemaker:
      hosts:
        maas-db03.example.com:
        maas-db02.example.com:
        maas-db01.example.com:
    maas_corosync:
      hosts:
        maas-db03.example.com:
        maas-db02.example.com:
        maas-db01.example.com:
    maas_postgres_proxy:
      hosts:
        maas.example.com:
    maas_proxy:
      # hosts:
      #   maas.example.com:
    maas_region_controller:
      hosts:
        maas.example.com:
        # maas-region01.example.com:
        # maas-region02.example.com:
        # maas-region03.example.com:
    maas_rack_controller:
      hosts:
        maas.example.com:
        # maas-rack01.example.com:
        # maas-rack02.example.com:
        # maas-rack03.example.com:

Note: I am only specifying a single region and rack controller while I'm testing the HA PostgreSQL setup, so that is why other hosts in the inventory are commented out.

Worth noting

This is where I would think corosync and pacemaker need to be installed on the other db hosts. Here, you can see it is only being installed on maas-db01.

PLAY [maas_corosync] ***************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************
ok: [maas-db01.example.com]

TASK [maas_corosync : Install Corosync] ********************************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_corosync : Write Corosync config] ***************************************************************************************************
changed: [maas-db01.example.com]

RUNNING HANDLER [maas_corosync : Restart Corosync] *********************************************************************************************
changed: [maas-db01.example.com]

PLAY [maas_pacemaker] **************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************
ok: [maas-db01.example.com]

TASK [maas_pacemaker : Install Pacemaker packages] *********************************************************************************************
changed: [maas-db01.example.com] => (item=pacemaker)
changed: [maas-db01.example.com] => (item=pcs)
changed: [maas-db01.example.com] => (item=fence-agents)
changed: [maas-db01.example.com] => (item=resource-agents-paf)

TASK [maas_pacemaker : Add temp file config for pacemaker-managed postgres] ********************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Generate Pacemaker user Password] ***************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Save Pacemaker user Password] *******************************************************************************************
ok: [maas-db01.example.com -> maas-db03.example.com] => (item=maas-db03.example.com)
ok: [maas-db01.example.com -> maas-db02.example.com] => (item=maas-db02.example.com)
ok: [maas-db01.example.com] => (item=maas-db01.example.com)

TASK [maas_pacemaker : Set pacemaker user password] ********************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Configure ssh for pacemaker] ********************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Flush handlers] *********************************************************************************************************

RUNNING HANDLER [maas_pacemaker : Ensure Pacemaker is started] *********************************************************************************
changed: [maas-db01.example.com]

RUNNING HANDLER [maas_pacemaker : Restart sshd] ************************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Override /etc/hosts for External Address] *******************************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Auth cluster] ***********************************************************************************************************
FAILED - RETRYING: [maas-db01.example.com]: Auth cluster (3 retries left).
FAILED - RETRYING: [maas-db01.example.com]: Auth cluster (2 retries left).
FAILED - RETRYING: [maas-db01.example.com]: Auth cluster (1 retries left).
fatal: [maas-db01.example.com]: FAILED! => {"attempts": 3, "changed": true, "cmd": "pcs host auth maas-db03.example.com maas-db02.example.com maas-db01.example.com -u hacluster -p 'YgtGe+Ton7n+5OFLGmM=' && touch /tmp/pacemaker_auth", "delta": "0:00:01.944274", "end": "2023-04-05 17:17:56.203499", "msg": "non-zero return code", "rc": 1, "start": "2023-04-05 17:17:54.259225", "stderr": "Error: Unable to communicate with maas-db03.example.com\nError: Unable to communicate with maas-db02.example.com", "stderr_lines": ["Error: Unable to communicate with maas-db03.example.com", "Error: Unable to communicate with maas-db02.example.com"], "stdout": "maas-db01.example.com: Authorized", "stdout_lines": ["maas-db01.example.com: Authorized"]}

Full Output

ansible-playbook -i inventory.yaml ./site.yaml

$ ansible-playbook -i inventory.yaml ./site.yaml

PLAY [all] *************************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************
ok: [maas-db02.example.com]
ok: [maas-db03.example.com]
ok: [maas-db01.example.com]
ok: [maas.example.com]

TASK [Ensure required variables have been defined] *********************************************************************************************
ok: [maas-db01.example.com]
ok: [maas-db02.example.com]
ok: [maas-db03.example.com]
ok: [maas.example.com]

TASK [Ensure maas_version is a version string] *************************************************************************************************
ok: [maas-db01.example.com] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [maas-db02.example.com] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [maas-db03.example.com] => {
    "changed": false,
    "msg": "All assertions passed"
}
ok: [maas.example.com] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [Ensure o11y can be enabled] **************************************************************************************************************
skipping: [maas-db01.example.com]
skipping: [maas-db02.example.com]
skipping: [maas-db03.example.com]
skipping: [maas.example.com]

TASK [Define proxy environment if proxies given] ***********************************************************************************************
ok: [maas-db01.example.com]
ok: [maas-db02.example.com]
ok: [maas-db03.example.com]
ok: [maas.example.com]

TASK [Show proxy environment if in use] ********************************************************************************************************
skipping: [maas-db01.example.com]
skipping: [maas-db02.example.com]
skipping: [maas-db03.example.com]
skipping: [maas.example.com]

TASK [Ensure dist is up to date] ***************************************************************************************************************
ok: [maas-db03.example.com]
ok: [maas-db02.example.com]
ok: [maas.example.com]
ok: [maas-db01.example.com]

TASK [Discover host architecture] **************************************************************************************************************
ok: [maas-db03.example.com]
ok: [maas-db02.example.com]
ok: [maas-db01.example.com]
ok: [maas.example.com]

TASK [Set architechture facts] *****************************************************************************************************************
ok: [maas-db01.example.com]
ok: [maas-db02.example.com]
ok: [maas-db03.example.com]
ok: [maas.example.com]

PLAY [maas_postgres] ***************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************
ok: [maas-db02.example.com]
ok: [maas-db01.example.com]
ok: [maas-db03.example.com]

TASK [maas_postgres : Install Postgres] ********************************************************************************************************
included: /Users/ben/git/maas-ansible-playbook/roles/maas_postgres/tasks/install_postgres.yaml for maas-db01.example.com, maas-db02.example.com, maas-db03.example.com

TASK [maas_postgres : Install PostgreSQL and configuration dependencies] ***********************************************************************
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Install xinetd] **********************************************************************************************************
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Generate Replication Password] *******************************************************************************************
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Save Replication Password] ***********************************************************************************************
ok: [maas-db01.example.com] => (item=maas-db01.example.com)
ok: [maas-db01.example.com -> maas-db02.example.com] => (item=maas-db02.example.com)
ok: [maas-db01.example.com -> maas-db03.example.com] => (item=maas-db03.example.com)

TASK [maas_postgres : Write pg_hba.conf] *******************************************************************************************************
changed: [maas-db02.example.com]
changed: [maas-db03.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Write postgresql.conf] ***************************************************************************************************
included: /Users/ben/git/maas-ansible-playbook/roles/maas_postgres/tasks/write_postgres_config.yaml for maas-db01.example.com, maas-db02.example.com, maas-db03.example.com

TASK [maas_postgres : Write postgresql.conf] ***************************************************************************************************
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Flush Handlers] **********************************************************************************************************

TASK [maas_postgres : Flush Handlers] **********************************************************************************************************

TASK [maas_postgres : Flush Handlers] **********************************************************************************************************

RUNNING HANDLER [maas_postgres : Stop Postgres Service To Load New Configuration] **************************************************************
changed: [maas-db02.example.com]
changed: [maas-db03.example.com]
changed: [maas-db01.example.com]

RUNNING HANDLER [maas_postgres : Start Postgres Service To Load New Configuration] *************************************************************
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Create MAAS Postgres User] ***********************************************************************************************
[WARNING]: Module remote_tmp /var/lib/postgresql/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when
running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually
changed: [maas-db02.example.com]
changed: [maas-db03.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Create MAAS Postgres Database] *******************************************************************************************
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Write pgsql_check script] ************************************************************************************************
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Add pgsql_check_v4 service entry] ****************************************************************************************
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Add pgsql_check_v6 service entry] ****************************************************************************************
changed: [maas-db02.example.com]
changed: [maas-db03.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Write pgsql_check xinetd config file] ************************************************************************************
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]
changed: [maas-db01.example.com]

TASK [maas_postgres : Create Replication User] *************************************************************************************************
skipping: [maas-db02.example.com]
skipping: [maas-db03.example.com]
included: /Users/ben/git/maas-ansible-playbook/roles/maas_postgres/tasks/create_replication_user.yaml for maas-db01.example.com

TASK [maas_postgres : Create Replication User] *************************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_postgres : Create Replication Slots] ************************************************************************************************
changed: [maas-db01.example.com] => (item=maas-db01.example.com)
changed: [maas-db01.example.com] => (item=maas-db02.example.com)
changed: [maas-db01.example.com] => (item=maas-db03.example.com)

TASK [maas_postgres : Configure Postgres as a secondary] ***************************************************************************************
included: /Users/ben/git/maas-ansible-playbook/roles/maas_postgres/tasks/configure_postgres_secondary.yaml for maas-db01.example.com, maas-db02.example.com, maas-db03.example.com

TASK [maas_postgres : Add temporary primary IP] ************************************************************************************************
skipping: [maas-db01.example.com]
skipping: [maas-db02.example.com]
skipping: [maas-db03.example.com]

TASK [maas_postgres : Add floating IP to postgres' bind addresses] *****************************************************************************
skipping: [maas-db01.example.com]
skipping: [maas-db02.example.com]
skipping: [maas-db03.example.com]

TASK [maas_postgres : Ensure application_name Preserved] ***************************************************************************************
skipping: [maas-db01.example.com]
ok: [maas-db03.example.com]
ok: [maas-db02.example.com]

TASK [maas_postgres : Stop Postgres To Clear Out Data] *****************************************************************************************
skipping: [maas-db01.example.com]
changed: [maas-db02.example.com]
changed: [maas-db03.example.com]

TASK [maas_postgres : Remove Previous Data Directory] ******************************************************************************************
skipping: [maas-db01.example.com]
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]

TASK [maas_postgres : Create a New Data Directory] *********************************************************************************************
skipping: [maas-db01.example.com]
changed: [maas-db03.example.com]
changed: [maas-db02.example.com]

TASK [maas_postgres : Create Base Backup] ******************************************************************************************************
skipping: [maas-db01.example.com]
fatal: [maas-db02.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'maas_postgres_floating_ip' is undefined. 'maas_postgres_floating_ip' is undefined\n\nThe error appears to be in '/Users/ben/git/maas-ansible-playbook/roles/maas_postgres/tasks/configure_postgres_secondary.yaml': line 50, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: \"Create Base Backup\"\n  ^ here\n"}
fatal: [maas-db03.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'maas_postgres_floating_ip' is undefined. 'maas_postgres_floating_ip' is undefined\n\nThe error appears to be in '/Users/ben/git/maas-ansible-playbook/roles/maas_postgres/tasks/configure_postgres_secondary.yaml': line 50, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: \"Create Base Backup\"\n  ^ here\n"}

TASK [maas_postgres : Create standby.signal] ***************************************************************************************************
skipping: [maas-db01.example.com]

TASK [maas_postgres : Flush Handlers] **********************************************************************************************************

RUNNING HANDLER [maas_postgres : Restart xinetd] ***********************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_postgres : Disable Postgres in systemd for pacemaker management] ********************************************************************
changed: [maas-db01.example.com]

TASK [maas_postgres : Remove temporary primary IP] *********************************************************************************************
skipping: [maas-db01.example.com]

TASK [maas_postgres : Disable auto-start] ******************************************************************************************************
changed: [maas-db01.example.com]

TASK [Setup firewall] **************************************************************************************************************************
skipping: [maas-db01.example.com]

PLAY [maas_corosync] ***************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************
ok: [maas-db01.example.com]

TASK [maas_corosync : Install Corosync] ********************************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_corosync : Write Corosync config] ***************************************************************************************************
changed: [maas-db01.example.com]

RUNNING HANDLER [maas_corosync : Restart Corosync] *********************************************************************************************
changed: [maas-db01.example.com]

PLAY [maas_pacemaker] **************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************
ok: [maas-db01.example.com]

TASK [maas_pacemaker : Install Pacemaker packages] *********************************************************************************************
changed: [maas-db01.example.com] => (item=pacemaker)
changed: [maas-db01.example.com] => (item=pcs)
changed: [maas-db01.example.com] => (item=fence-agents)
changed: [maas-db01.example.com] => (item=resource-agents-paf)

TASK [maas_pacemaker : Add temp file config for pacemaker-managed postgres] ********************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Generate Pacemaker user Password] ***************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Save Pacemaker user Password] *******************************************************************************************
ok: [maas-db01.example.com -> maas-db03.example.com] => (item=maas-db03.example.com)
ok: [maas-db01.example.com -> maas-db02.example.com] => (item=maas-db02.example.com)
ok: [maas-db01.example.com] => (item=maas-db01.example.com)

TASK [maas_pacemaker : Set pacemaker user password] ********************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Configure ssh for pacemaker] ********************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Flush handlers] *********************************************************************************************************

RUNNING HANDLER [maas_pacemaker : Ensure Pacemaker is started] *********************************************************************************
changed: [maas-db01.example.com]

RUNNING HANDLER [maas_pacemaker : Restart sshd] ************************************************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Override /etc/hosts for External Address] *******************************************************************************
changed: [maas-db01.example.com]

TASK [maas_pacemaker : Auth cluster] ***********************************************************************************************************
FAILED - RETRYING: [maas-db01.example.com]: Auth cluster (3 retries left).
FAILED - RETRYING: [maas-db01.example.com]: Auth cluster (2 retries left).
FAILED - RETRYING: [maas-db01.example.com]: Auth cluster (1 retries left).
fatal: [maas-db01.example.com]: FAILED! => {"attempts": 3, "changed": true, "cmd": "pcs host auth maas-db03.example.com maas-db02.example.com maas-db01.example.com -u hacluster -p 'YgtGe+Ton7n+5OFLGmM=' && touch /tmp/pacemaker_auth", "delta": "0:00:01.944274", "end": "2023-04-05 17:17:56.203499", "msg": "non-zero return code", "rc": 1, "start": "2023-04-05 17:17:54.259225", "stderr": "Error: Unable to communicate with maas-db03.example.com\nError: Unable to communicate with maas-db02.example.com", "stderr_lines": ["Error: Unable to communicate with maas-db03.example.com", "Error: Unable to communicate with maas-db02.example.com"], "stdout": "maas-db01.example.com: Authorized", "stdout_lines": ["maas-db01.example.com: Authorized"]}

PLAY [maas_postgres_proxy] *********************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************
ok: [maas.example.com]

TASK [maas_postgres_proxy : Install HAProxy] ***************************************************************************************************
changed: [maas.example.com]

TASK [maas_postgres_proxy : Write HAProxy Config] **********************************************************************************************
changed: [maas.example.com]

TASK [maas_postgres_proxy : Restart HAProxy] ***************************************************************************************************
changed: [maas.example.com]

TASK [Setup firewall] **************************************************************************************************************************
skipping: [maas.example.com]

PLAY [maas_proxy] ******************************************************************************************************************************
skipping: no hosts matched

PLAY [maas_region_controller] ******************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************
ok: [maas.example.com]

TASK [Verify MAAS Version supported on the host OS] ********************************************************************************************
skipping: [maas.example.com]

TASK [maas_region_controller : Check installed packages] ***************************************************************************************
ok: [maas.example.com]

TASK [maas_region_controller : Check installed snaps] ******************************************************************************************
ok: [maas.example.com]

TASK [maas_region_controller : Determine MAAS installation status] *****************************************************************************
ok: [maas.example.com]

TASK [maas_region_controller : Install MAAS - Snap] ********************************************************************************************
[DEPRECATION WARNING]: The CmdMixin used in classes CmdModuleHelper and CmdStateModuleHelper is being deprecated. Modules should use 
community.general.plugins.module_utils.cmd_runner.CmdRunner instead. This feature will be removed from community.general in version 8.0.0. 
Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
changed: [maas.example.com]

TASK [maas_region_controller : Add MAAS apt Repository] ****************************************************************************************
skipping: [maas.example.com]

TASK [maas_region_controller : Install Chrony] *************************************************************************************************
changed: [maas.example.com]

TASK [maas_region_controller : Install MAAS Region Controller - Deb] ***************************************************************************
skipping: [maas.example.com]

TASK [maas_region_controller : Update regiond.conf] ********************************************************************************************
skipping: [maas.example.com]

TASK [maas_region_controller : Initialise MAAS Controller - Snap] ******************************************************************************
fatal: [maas.example.com]: FAILED! => {"changed": true, "cmd": ["maas", "init", "region+rack", "--maas-url=http://maas.example.com:5240/MAAS", "--database-uri", "postgres://maas:MyPassword@10.30.102.10:5432/maasdb"], "delta": "0:01:06.284116", "end": "2023-04-05 17:20:48.229348", "msg": "non-zero return code", "rc": 1, "start": "2023-04-05 17:19:41.945232", "stderr": "Traceback (most recent call last):\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/base/base.py\", line 219, in ensure_connection\n    self.connect()\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/utils/asyncio.py\", line 33, in inner\n    return func(*args, **kwargs)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/base/base.py\", line 200, in connect\n    self.connection = self.get_new_connection(conn_params)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/utils/asyncio.py\", line 33, in inner\n    return func(*args, **kwargs)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py\", line 187, in get_new_connection\n    connection = Database.connect(**conn_params)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/psycopg2/__init__.py\", line 122, in connect\n    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)\npsycopg2.OperationalError: connection to server at \"10.30.102.10\", port 5432 failed: Connection refused\n\tIs the server running on that host and accepting TCP/IP connections?\n\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/snap/maas/26658/bin/maas-region\", line 8, in <module>\n    sys.exit(run())\n  File \"/snap/maas/26658/lib/python3.10/site-packages/maasserver/region_script.py\", line 78, in run\n    run_django(is_snap, is_devenv)\n  File \"/snap/maas/26658/lib/python3.10/site-packages/maasserver/region_script.py\", line 67, in run_django\n    management.execute_from_command_line()\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/core/management/__init__.py\", line 419, in execute_from_command_line\n    utility.execute()\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/core/management/__init__.py\", line 413, in execute\n    self.fetch_command(subcommand).run_from_argv(self.argv)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/core/management/base.py\", line 354, in run_from_argv\n    self.execute(*args, **cmd_options)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/core/management/base.py\", line 398, in execute\n    output = self.handle(*args, **options)\n  File \"/snap/maas/26658/lib/python3.10/site-packages/maasserver/management/commands/dbupgrade.py\", line 107, in handle\n    conn.ensure_connection()\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/utils/asyncio.py\", line 33, in inner\n    return func(*args, **kwargs)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/base/base.py\", line 218, in ensure_connection\n    with self.wrap_database_errors:\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/utils.py\", line 90, in __exit__\n    raise dj_exc_value.with_traceback(traceback) from exc_value\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/base/base.py\", line 219, in ensure_connection\n    self.connect()\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/utils/asyncio.py\", line 33, in inner\n    return func(*args, **kwargs)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/base/base.py\", line 200, in connect\n    self.connection = self.get_new_connection(conn_params)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/utils/asyncio.py\", line 33, in inner\n    return func(*args, **kwargs)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py\", line 187, in get_new_connection\n    connection = Database.connect(**conn_params)\n  File \"/snap/maas/26658/usr/lib/python3/dist-packages/psycopg2/__init__.py\", line 122, in connect\n    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)\ndjango.db.utils.OperationalError: connection to server at \"10.30.102.10\", port 5432 failed: Connection refused\n\tIs the server running on that host and accepting TCP/IP connections?\n\nTraceback (most recent call last):\n  File \"/snap/maas/26658/bin/maas\", line 8, in <module>\n    sys.exit(main())\n  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/__init__.py\", line 39, in main\n    options.execute(options)\n  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 482, in __call__\n    raise exc\n  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 479, in __call__\n    self.handle(options)\n  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 739, in handle\n    self._finalize_init(mode)\n  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 752, in _finalize_init\n    perform_work(\n  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 412, in perform_work\n    return cmd(*args, **kwargs)\n  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 387, in migrate_db\n    subprocess.check_call(\n  File \"/usr/lib/python3.10/subprocess.py\", line 369, in check_call\n    raise CalledProcessError(retcode, cmd)\nsubprocess.CalledProcessError: Command '['/snap/maas/26658/bin/maas-region', 'dbupgrade']' returned non-zero exit status 1.", "stderr_lines": ["Traceback (most recent call last):", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/base/base.py\", line 219, in ensure_connection", "    self.connect()", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/utils/asyncio.py\", line 33, in inner", "    return func(*args, **kwargs)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/base/base.py\", line 200, in connect", "    self.connection = self.get_new_connection(conn_params)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/utils/asyncio.py\", line 33, in inner", "    return func(*args, **kwargs)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py\", line 187, in get_new_connection", "    connection = Database.connect(**conn_params)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/psycopg2/__init__.py\", line 122, in connect", "    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)", "psycopg2.OperationalError: connection to server at \"10.30.102.10\", port 5432 failed: Connection refused", "\tIs the server running on that host and accepting TCP/IP connections?", "", "", "The above exception was the direct cause of the following exception:", "", "Traceback (most recent call last):", "  File \"/snap/maas/26658/bin/maas-region\", line 8, in <module>", "    sys.exit(run())", "  File \"/snap/maas/26658/lib/python3.10/site-packages/maasserver/region_script.py\", line 78, in run", "    run_django(is_snap, is_devenv)", "  File \"/snap/maas/26658/lib/python3.10/site-packages/maasserver/region_script.py\", line 67, in run_django", "    management.execute_from_command_line()", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/core/management/__init__.py\", line 419, in execute_from_command_line", "    utility.execute()", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/core/management/__init__.py\", line 413, in execute", "    self.fetch_command(subcommand).run_from_argv(self.argv)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/core/management/base.py\", line 354, in run_from_argv", "    self.execute(*args, **cmd_options)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/core/management/base.py\", line 398, in execute", "    output = self.handle(*args, **options)", "  File \"/snap/maas/26658/lib/python3.10/site-packages/maasserver/management/commands/dbupgrade.py\", line 107, in handle", "    conn.ensure_connection()", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/utils/asyncio.py\", line 33, in inner", "    return func(*args, **kwargs)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/base/base.py\", line 218, in ensure_connection", "    with self.wrap_database_errors:", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/utils.py\", line 90, in __exit__", "    raise dj_exc_value.with_traceback(traceback) from exc_value", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/base/base.py\", line 219, in ensure_connection", "    self.connect()", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/utils/asyncio.py\", line 33, in inner", "    return func(*args, **kwargs)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/base/base.py\", line 200, in connect", "    self.connection = self.get_new_connection(conn_params)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/utils/asyncio.py\", line 33, in inner", "    return func(*args, **kwargs)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/django/db/backends/postgresql/base.py\", line 187, in get_new_connection", "    connection = Database.connect(**conn_params)", "  File \"/snap/maas/26658/usr/lib/python3/dist-packages/psycopg2/__init__.py\", line 122, in connect", "    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)", "django.db.utils.OperationalError: connection to server at \"10.30.102.10\", port 5432 failed: Connection refused", "\tIs the server running on that host and accepting TCP/IP connections?", "", "Traceback (most recent call last):", "  File \"/snap/maas/26658/bin/maas\", line 8, in <module>", "    sys.exit(main())", "  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/__init__.py\", line 39, in main", "    options.execute(options)", "  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 482, in __call__", "    raise exc", "  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 479, in __call__", "    self.handle(options)", "  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 739, in handle", "    self._finalize_init(mode)", "  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 752, in _finalize_init", "    perform_work(", "  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 412, in perform_work", "    return cmd(*args, **kwargs)", "  File \"/snap/maas/26658/lib/python3.10/site-packages/maascli/snap.py\", line 387, in migrate_db", "    subprocess.check_call(", "  File \"/usr/lib/python3.10/subprocess.py\", line 369, in check_call", "    raise CalledProcessError(retcode, cmd)", "subprocess.CalledProcessError: Command '['/snap/maas/26658/bin/maas-region', 'dbupgrade']' returned non-zero exit status 1."], "stdout": "Performing database migrations", "stdout_lines": ["Performing database migrations"]}

PLAY RECAP *************************************************************************************************************************************
maas-db01.example.com   : ok=45   changed=29   unreachable=0    failed=1    skipped=12   rescued=0    ignored=0   
maas-db02.example.com   : ok=28   changed=16   unreachable=0    failed=1    skipped=5    rescued=0    ignored=0   
maas-db03.example.com   : ok=28   changed=16   unreachable=0    failed=1    skipped=5    rescued=0    ignored=0   
maas.example.com        : ok=17   changed=5    unreachable=0    failed=1    skipped=7    rescued=0    ignored=0
ben-ihelputech commented 1 year ago

Also worth noting, when running in an LXD container, corosync needs to be a privileged container, or else the service will fail.

alexsander-souza commented 1 year ago

You must to define maas_postgres_floating_ip and maas_postgres_floating_ip_prefix_len when deploying HA postgres.

Pacemaker and Corosync are not being installed because processing in those nodes was aborted sooner:

fatal: [maas-db02.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'maas_postgres_floating_ip' is undefined. 'maas_postgres_floating_ip' is undefined\n\nThe error appears to be in '/Users/ben/git/maas-ansible-playbook/roles/maas_postgres/tasks/configure_postgres_secondary.yaml': line 50, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: \"Create Base Backup\"\n  ^ here\n"}
fatal: [maas-db03.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'maas_postgres_floating_ip' is undefined. 'maas_postgres_floating_ip' is undefined\n\nThe error appears to be in '/Users/ben/git/maas-ansible-playbook/roles/maas_postgres/tasks/configure_postgres_secondary.yaml': line 50, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: \"Create Base Backup\"\n  ^ here\n"}
ben-ihelputech commented 1 year ago

Turns out I was referencing an outdated version of the readme. Entering those variables resolved the issue.