HA scenario does not cover fail-over

nobuto-m commented 7 years ago

The second API server connects to $PRIMARY_PG_SERVER. https://github.com/CanonicalLtd/maas-docs/blob/617a7d2e8f46e9fab4a45fb581562ae66b41dbf8/en/manage-ha.md#secondary-api-server

sudo maas-region local_config_set --database-host $PRIMARY_PG_SERVER

I'm not familiar with PostgreSQL, but If the primary server dies, the second API server takes over VIP, but no database to connect? So it looks like both API servers should connect to PostgreSQL using VIP instead of the real primary IP, and keepalived or equivalent should have some postgres status check.

In that case, all real IPs of API servers and VIP have to be written in pg_hba.conf to allow connections to maasdb as maas user.

pmatulis commented 7 years ago

@nobuto-m Did you end up changing to a working configuration? If so, please share it here.

nobuto-m commented 7 years ago

@pmatulis I'm currently in testing. Will share as soon as I verify it.

nobuto-m commented 7 years ago

Here is how I tested the MAAS HA deployment on top of LXD with corosync/pacemaker. This is not mature enough to become a pull request, but I hope it helps to be a start point of the new HA doc. I believe it covers #385 as well.

A few notes:

corosync/pacemaker using udpu(unicast) with quorum
separate vip for postgresql and maas-regiond because I'm not sure we want to move postgresql master when maas-regiond stopped
my intentions are grp_pgsql_vip follows ms_pgsql:Master, grp_regiond_vip will run on a node having maas-regiond, apache2, bind9 and maas-proxy alive (I may be wrong as I'm not an expert of corosync/pacemaker)
maas-rackd and maas-dhcpd are monitored, but not managed, just for informative purpose

On my laptop, ha-test-on-lxd.sh completes around 15 minutes.

[Status]

# crm_mon -fAr -1
Last updated: Mon Apr 10 04:28:51 2017      Last change: Mon Apr 10 04:27:18 2017 by root via cibadmin on maas-ha-test-gh-386-1
Stack: corosync
Current DC: maas-ha-test-gh-386-2 (version 1.1.14-70404b0) - partition with quorum
3 nodes and 24 resources configured

Online: [ maas-ha-test-gh-386-1 maas-ha-test-gh-386-2 maas-ha-test-gh-386-3 ]

Full list of resources:

 Resource Group: grp_pgsql_vip
     res_pgsql_vip  (ocf::heartbeat:IPaddr2):   Started maas-ha-test-gh-386-1
 Master/Slave Set: ms_pgsql [res_pgsql]
     Masters: [ maas-ha-test-gh-386-1 ]
     Slaves: [ maas-ha-test-gh-386-2 maas-ha-test-gh-386-3 ]
 Resource Group: grp_regiond_vip
     res_regiond_vip    (ocf::heartbeat:IPaddr2):   Started maas-ha-test-gh-386-2
     res_regiond_vip_ext    (ocf::heartbeat:IPaddr2):   Started maas-ha-test-gh-386-2
 Clone Set: cl_apache2 [res_apache2]
     Started: [ maas-ha-test-gh-386-1 maas-ha-test-gh-386-2 maas-ha-test-gh-386-3 ]
 Clone Set: cl_bind9 [res_bind9]
     Started: [ maas-ha-test-gh-386-1 maas-ha-test-gh-386-2 maas-ha-test-gh-386-3 ]
 Clone Set: cl_maas-dhcpd [res_maas-dhcpd]
     Stopped: [ maas-ha-test-gh-386-1 maas-ha-test-gh-386-2 maas-ha-test-gh-386-3 ]
 Clone Set: cl_maas-proxy [res_maas-proxy]
     Started: [ maas-ha-test-gh-386-1 maas-ha-test-gh-386-2 maas-ha-test-gh-386-3 ]
 Clone Set: cl_maas-rackd [res_maas-rackd]
     res_maas-rackd (systemd:maas-rackd):   Started maas-ha-test-gh-386-1 (unmanaged)
     res_maas-rackd (systemd:maas-rackd):   Started maas-ha-test-gh-386-3 (unmanaged)
     res_maas-rackd (systemd:maas-rackd):   Started maas-ha-test-gh-386-2 (unmanaged)
 Clone Set: cl_maas-regiond [res_maas-regiond]
     Started: [ maas-ha-test-gh-386-1 maas-ha-test-gh-386-2 maas-ha-test-gh-386-3 ]

Node Attributes:
* Node maas-ha-test-gh-386-1:
    + master-res_pgsql                  : 1000      
    + res_pgsql-data-status             : LATEST    
    + res_pgsql-master-baseline         : 0000000004000098
    + res_pgsql-receiver-status         : normal (master)
    + res_pgsql-status                  : PRI       
    + res_pgsql-xlog-loc                : 0000000004000098
* Node maas-ha-test-gh-386-2:
    + master-res_pgsql                  : 100       
    + res_pgsql-data-status             : STREAMING|SYNC
    + res_pgsql-receiver-status         : normal    
    + res_pgsql-status                  : HS:sync   
    + res_pgsql-xlog-loc                : 0000000003000000
* Node maas-ha-test-gh-386-3:
    + master-res_pgsql                  : -INFINITY 
    + res_pgsql-data-status             : STREAMING|ASYNC
    + res_pgsql-receiver-status         : normal    
    + res_pgsql-status                  : HS:async  
    + res_pgsql-xlog-loc                : 0000000004000000

Migration Summary:
* Node maas-ha-test-gh-386-1:
* Node maas-ha-test-gh-386-3:
* Node maas-ha-test-gh-386-2:

[base.crm]

property stonith-enabled=false
rsc_defaults \
    resource-stickiness=INFINITY \
    migration-threshold=1

[pgsql.crm]

primitive res_pgsql_vip ocf:heartbeat:IPaddr2 \
    params ip=10.0.8.201 cidr_netmask=32 \
    op monitor interval=10s \
    meta migration-threshold=0
group grp_pgsql_vip \
    res_pgsql_vip

master ms_pgsql res_pgsql \
    master-max=1 master-node-max=1 \
    clone-max=3 clone-node-max=1 \
    notify=true
primitive res_pgsql ocf:heartbeat:pgsql \
    params \
        pgctl=/usr/lib/postgresql/9.5/bin/pg_ctl \
        config=/etc/postgresql/9.5/main/postgresql.conf \
        socketdir=/var/run/postgresql \
        pgdata=/var/lib/postgresql/9.5/main \
        tmpdir=/var/lib/postgresql/9.5/tmp \
        logfile=/var/log/postgresql/postgresql-9.5-main.log \
        rep_mode=sync \
        node_list="maas-ha-test-gh-386-1 maas-ha-test-gh-386-2 maas-ha-test-gh-386-3" \
        restore_command="cp /var/lib/postgresql/9.5/main/pg_archive/%f %p" \
        master_ip=10.0.8.201 \
        repuser=repuser \
        primary_conninfo_opt="password=repuser keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \
        check_wal_receiver=true \
    op start interval=0 timeout=120s \
    op monitor depth=0 interval=10s timeout=30s \
    op monitor depth=0 interval=9s timeout=30s role=Master \
    op stop interval=0 timeout=120s

colocation col_pgsql_vip inf: grp_pgsql_vip \
    ms_pgsql:Master
order ord_promote inf: ms_pgsql:promote grp_pgsql_vip:start symmetrical=false
order ord_demote 0: ms_pgsql:demote grp_pgsql_vip:stop symmetrical=false

[maas.crm]

primitive res_regiond_vip ocf:heartbeat:IPaddr2 \
    params ip=10.0.8.202 cidr_netmask=32 \
    op monitor interval=10s
primitive res_regiond_vip_ext ocf:heartbeat:IPaddr2 \
    params ip=10.0.8.203 cidr_netmask=32 \
    op monitor interval=10s
group grp_regiond_vip \
    res_regiond_vip \
    res_regiond_vip_ext

primitive res_maas-regiond systemd:maas-regiond \
    op start interval=0 timeout=120s \
    op monitor interval=10s timeout=120s \
    op stop interval=0 timeout=120s
clone cl_maas-regiond res_maas-regiond
primitive res_apache2 systemd:apache2 \
    op start interval=0 timeout=120s \
    op monitor interval=10s timeout=120s \
    op stop interval=0 timeout=120s
clone cl_apache2 res_apache2
primitive res_bind9 systemd:bind9 \
    op start interval=0 timeout=120s \
    op monitor interval=10s timeout=120s \
    op stop interval=0 timeout=120s
clone cl_bind9 res_bind9
primitive res_maas-proxy systemd:maas-proxy \
    op start interval=0 timeout=120s \
    op monitor interval=10s timeout=120s \
    op stop interval=0 timeout=120s
clone cl_maas-proxy res_maas-proxy

colocation col_regiond_vip_regiond inf: grp_regiond_vip cl_maas-regiond
colocation col_regiond_vip_apache2 inf: grp_regiond_vip cl_apache2
colocation col_regiond_vip_bind9 inf: grp_regiond_vip cl_bind9
colocation col_regiond_vip_maas-proxy inf: grp_regiond_vip cl_maas-proxy

primitive res_maas-rackd systemd:maas-rackd \
    op start interval=0 timeout=120s \
    op monitor interval=10s timeout=120s \
    op stop interval=0 timeout=120s \
    meta is-managed=false
clone cl_maas-rackd res_maas-rackd
primitive res_maas-dhcpd systemd:maas-dhcpd \
    op start interval=0 timeout=120s \
    op monitor interval=10s timeout=120s \
    op stop interval=0 timeout=120s \
    meta is-managed=false
clone cl_maas-dhcpd res_maas-dhcpd

[ha-test-on-lxd.sh]

#!/bin/bash

set -e
set -u
set -x

# pick VIPs from outside of LXD_IPV4_DHCP_RANGE
#     $ grep LXD_IPV4_DHCP_RANGE /etc/default/lxd-bridge
#     LXD_IPV4_DHCP_RANGE="10.0.8.51,10.0.8.200" (in my case)
# and put those vips in *.crm as well
VIP_PGSQL=10.0.8.201
VIP_MAAS_REGIOND=10.0.8.202
# for simulating administrator access from external network
# I'm lazy to prepare a different subnet in this test
VIP_MAAS_REGIOND_EXT=10.0.8.203

# Edit those crm configuration files to include VIPs above.
BASE_CRM=./base.crm
PGSQL_CRM=./pgsql.crm
MAAS_CRM=./maas.crm

LXD_PREFIX='maas-ha-test-gh-386'

# cleanup / delete existing containers
for i in {1..3}; do
    lxc delete "${LXD_PREFIX}-${i}" --force || true
done

# launch containers
for i in {1..3}; do
    lxc launch ubuntu:xenial "${LXD_PREFIX}-${i}"
done

# install PostgreSQL, corosync and pacemaker
for i in {1..3}; do
    lxc exec "${LXD_PREFIX}-${i}" -- bash -e -c '
        apt update
        apt install -y postgresql corosync pacemaker
    '
done

# push corosync.conf with udpu(unicast)
for i in {1..3}; do
    cat <<EOF | lxc file push - "${LXD_PREFIX}-${i}"/etc/corosync/corosync.conf
totem {
    version: 2
    crypto_cipher: none
    crypto_hash: none
    transport: udpu
}

quorum {
    provider: corosync_votequorum
}

nodelist {
    node {
        ring0_addr: $(lxc list -c 4 "${LXD_PREFIX}-1" | grep eth0 | col2)
        nodeid: 1000
    }
    node {
        ring0_addr: $(lxc list -c 4 "${LXD_PREFIX}-2" | grep eth0 | col2)
        nodeid: 1001
    }
    node {
        ring0_addr: $(lxc list -c 4 "${LXD_PREFIX}-3" | grep eth0 | col2)
        nodeid: 1002
    }
}
EOF
done

### restart corosync/pacemaker
for i in {1..3}; do
    lxc exec "${LXD_PREFIX}-${i}" -- bash -e -c '
        service corosync restart
        service pacemaker restart
    '
done

# stop PostgreSQL for now and disable auto start on boot
# to prevent unnecessary cluster disruptions on reboot
for i in {1..3}; do
    lxc exec "${LXD_PREFIX}-${i}" -- bash -e -c '
        service postgresql stop

        echo manual > /etc/postgresql/9.5/main/start.conf
    '
done

### setup replication

# start PostgreSQL on the primary node
lxc exec "${LXD_PREFIX}-1" -- bash -e -c '
    pg_ctlcluster 9.5 main start
'

# create repuser unattendedly
# sudo -u postgres createuser -U postgres \
#     repuser -P -c 10 --replication --no-password
lxc exec "${LXD_PREFIX}-1" -- sudo -u postgres psql -c "
        CREATE ROLE repuser PASSWORD 'md58ab1a75fe519fbd497653a855134aef7' \
            NOSUPERUSER NOCREATEDB NOCREATEROLE INHERIT LOGIN REPLICATION CONNECTION LIMIT 10;
"

# setup ACL
for i in {1..3}; do
    cat <<EOF | lxc exec "${LXD_PREFIX}-${i}" -- tee -a /etc/postgresql/9.5/main/pg_hba.conf

host replication repuser $VIP_PGSQL/32 md5
host replication repuser $VIP_MAAS_REGIOND/32 md5
host replication repuser $VIP_MAAS_REGIOND_EXT/32 md5

host replication repuser $(lxc list -c 4 "${LXD_PREFIX}-1" | grep eth0 | col2)/32 md5
host replication repuser $(lxc list -c 4 "${LXD_PREFIX}-2" | grep eth0 | col2)/32 md5
host replication repuser $(lxc list -c 4 "${LXD_PREFIX}-3" | grep eth0 | col2)/32 md5

host maasdb maas $VIP_PGSQL/32 md5
host maasdb maas $VIP_MAAS_REGIOND/32 md5
host maasdb maas $VIP_MAAS_REGIOND_EXT/32 md5

host maasdb maas $(lxc list -c 4 "${LXD_PREFIX}-1" | grep eth0 | col2)/32 md5
host maasdb maas $(lxc list -c 4 "${LXD_PREFIX}-2" | grep eth0 | col2)/32 md5
host maasdb maas $(lxc list -c 4 "${LXD_PREFIX}-3" | grep eth0 | col2)/32 md5
EOF
done

# create archive dir and write postgresql.conf.
for i in {1..3}; do
    lxc exec "${LXD_PREFIX}-${i}" -- bash -e -c '
        install -o postgres -g postgres -m 0700 -d /var/lib/postgresql/9.5/main/pg_archive
        install -o postgres -g postgres -m 0700 -d /var/lib/postgresql/9.5/tmp
        install -o postgres -g postgres -m 0600 /dev/null /var/lib/postgresql/9.5/tmp/rep_mode.conf
    '
done

for i in {1..3}; do
    cat <<EOF | lxc exec "${LXD_PREFIX}-${i}" -- tee -a /etc/postgresql/9.5/main/postgresql.conf

listen_addresses = '*'

wal_level = hot_standby
synchronous_commit = on
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/9.5/main/pg_archive/%f && cp %p /var/lib/postgresql/9.5/main/pg_archive/%f'
max_wal_senders = 10
wal_keep_segments = 256
hot_standby = on
restart_after_crash = off
hot_standby_feedback = on
EOF
done

# Restart the primary PostgreSQL to acccet replication
lxc exec "${LXD_PREFIX}-1" -- bash -e -c '
    pg_ctlcluster 9.5 main restart

    cat /var/lib/postgresql/9.5/main/postmaster.pid
'

# replicate db
for i in {2..3}; do
    lxc exec "${LXD_PREFIX}-${i}" -- bash -e -c "
        mv -v /var/lib/postgresql/9.5/main{,.bak}

        sudo -u postgres env PGPASSWORD='repuser' pg_basebackup \
            -h $(lxc list -c 4 "${LXD_PREFIX}-1" | grep eth0 | col2) \
            -D /var/lib/postgresql/9.5/main \
            -U repuser \
            -v -P --xlog-method=stream
    "
done

# Stop the primary PostgreSQL to be prepared for pgsql RA
lxc exec "${LXD_PREFIX}-1" -- bash -e -c '
    pg_ctlcluster 9.5 main stop
'

# load base crm configuration
lxc exec "${LXD_PREFIX}-1" -- crm configure load update - < "$BASE_CRM"
# load pgsql crm configuration
lxc exec "${LXD_PREFIX}-1" -- crm configure load update - < "$PGSQL_CRM"

echo 'Waiting until the master and one sync node are ready...'
while ! lxc exec "${LXD_PREFIX}-3" -- crm_mon -fAr -1 | grep -q 'STREAMING|SYNC' ; do
    sleep 10
done

# show status
lxc exec "${LXD_PREFIX}-1" -- crm_mon -fAr -1

# install MAAS on the primary, it will create maasdb
lxc exec "${LXD_PREFIX}-1" -- bash -e -c '
    # retry with disabling rlimit-nproc
    # this is necessary to run multiple avahi daemons under LXD without security.idmap.isolated
    apt install -y avahi-daemon || true
    sed -i -e "s/^rlimit-nproc=/#\0/" /etc/avahi/avahi-daemon.conf
    apt install -y avahi-daemon

    apt-add-repository -y ppa:maas/stable
    apt update
    apt install -y maas
'

# install MAAS on the remaining nodes
for i in {2..3}; do
    lxc exec "${LXD_PREFIX}-${i}" -- bash -e -c '
        # retry with disabling rlimit-nproc
        # this is necessary to run multiple avahi daemons under LXD without security.idmap.isolated
        apt install -y avahi-daemon || true
        sed -i -e "s/^rlimit-nproc=/#\0/" /etc/avahi/avahi-daemon.conf
        apt install -y avahi-daemon

        apt-add-repository -y ppa:maas/stable
        apt update
        # dont know why maas group is required for dpkg --unpack, but it happens.
        # make sure maas user/group exits by installing maas-common first.
        #
        # Preparing to unpack .../maas-rack-controller_2.1.5+bzr5596-0ubuntu1~16.04.1_all.deb ...
        # No such group: maas
        # dpkg: error processing archive /var/cache/apt/archives/maas-rack-controller_2.1.5+bzr5596-0ubuntu1~16.04.1_all.deb (--unpack):
        #  subprocess new pre-installation script returned error exit status 1
        apt install -y maas-common
        apt install -y maas-region-api maas-dns maas-rack-controller
    '
done

# get maasdb password and regiond secret
maasdb_password=$(lxc exec "${LXD_PREFIX}-1" -- maas-region local_config_get --plain --database-pass)
maas_secret=$(lxc exec "${LXD_PREFIX}-1" -- cat /var/lib/maas/secret)

# update MAAS configuration to use vip
for i in {1..3}; do
    lxc exec "${LXD_PREFIX}-${i}" -- bash -e -c "
        maas-region local_config_set \
            --maas-url http://$VIP_MAAS_REGIOND/MAAS \
            --database-host $VIP_PGSQL \
            --database-pass $maasdb_password

        maas-region edit_named_options --migrate-conflicting-options
        service bind9 restart
        service maas-regiond restart

        maas-rack register \
            --url http://$VIP_MAAS_REGIOND/MAAS \
            --secret $maas_secret
        service maas-rackd restart
    "
done

# wait for a while until all dependencies of maas-regind get started
sleep 10

# load maas crm
lxc exec "${LXD_PREFIX}-1" -- crm configure load update - < "$MAAS_CRM"

# status
sleep 10
lxc exec "${LXD_PREFIX}-1" -- crm_mon -fAr -1

# create admin
lxc exec "${LXD_PREFIX}-1" -- \
    sudo maas createadmin \
        --username admin \
        --password admin \
        --email admin@localhost.localdomain

echo "MAAS HA is ready on http://${VIP_MAAS_REGIOND_EXT}/MAAS"

nobuto-m commented 7 years ago

Hmm, maas-proxy is a bit tricky since it will be restarted by maas-regiond outside of pacemaker when proxy related configuration is changed by users.

nobuto-m commented 7 years ago

Nah, because of "proxy — disabled, alternate proxy is configured in settings." It looks like maas-proxy should not be a dependency of vip.

aym-frikha commented 7 years ago

Nobuto, why do we need grp_pgsql_vip if we only have one resource on it ?

nobuto-m commented 7 years ago

@fourou Not necessary for now. Just in case to add more vips to PostgreSQL, for example region controllers are on different subnets and PostgreSQL needs to listen on multiple subnets.

aym-frikha commented 7 years ago

@nobuto-m , I see, thanks for the response. Another thing I want to understand concerning the colocation: You put : colocation col_regiond_vip_regiond inf: grp_regiond_vip cl_maas-regiond which mean: grp_regiond_vip should be colocated with cl_maas-regiond , so if cl_maas-regiond does not exist in any node, so grp_regiond_vip still UP because we use inf -> (Should). But if we use +inf -> (Must) We are sure that the vip will no longer exist if regiond is not working. What do you think ?

nobuto-m commented 7 years ago

@fourou Is there any difference between inf and +inf? http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch06.html#_infinity_math Did you actually test the behavior?

aym-frikha commented 7 years ago

@nobuto-m , Yes I tested the behavior, and according to : https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/s1-colocationconstraints-HAAR.html

score Positive values indicate the resource should run on the same node. Negative values indicate the resources should not run on the same node. A value of + INFINITY, the default value, indicates that the source_resource must run on the same node as the target_resource. A value of - INFINITY indicates that the source_resource must not run on the same node as the target_resource.

aym-frikha commented 7 years ago

Also, we can find information here: http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_deciding_which_nodes_a_resource_can_run_on.html

CanonicalLtd / maas-docs

HA scenario does not cover fail-over #386