XenServer plug-ins for Pacemaker
We've written a pair of plug-ins for Pacemaker to help create an automatically redundant active/passive pair of XenServers VMs using a DRBD backend.
WHAT THIS DOES:
REQUIREMENTS:
TODO:
We have written two new Pacemaker resource management agents:
These plug-ins were derived from the Xen resource agent that comes with Pacemaker.
The idea behind these plug-ins is to let you create a Primary/Secondary DRBD back-end shared between 2 servers that is automatically managed by Pacemaker in case of fail-over.
In short:
We provide some DRBD RPMs compiled for various versions of XenServer: http://download.locatrix.com/drbd
Note: The RPM must exactly match your XenServer version - otherwise you can build it yourself with the Citrix XenServer DDK VM.
NOTE: uname -a on your XenServer must return exactly "2.6.32.43-0.4.1.xs1.6.10.734.170748xen" to use these DRBD RPMs. This is the original XS 6.1 from the install CD. If you've installed hotfixes it may have updated your kernel version and you'll need to compile DRBD yourself.
wget http://download.locatrix.com/drbd/xenserver6.1.0/drbd-xen-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.1.0/drbd-pacemaker-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.1.0/drbd-heartbeat-8.4.3-2.i386.rpm
rpm -i drbd-heartbeat-8.4.3-2.i386.rpm
rpm -i drbd-xen-8.4.3-2.i386.rpm
rpm -i drbd-pacemaker-8.4.3-2.i386.rpm
This fixes a race condition I found because of xapi-domains shutdown that calls /opt/xensource/libexec/shutdown
. This was shutting down VMs and then heartbeat was restarting them which caused some confusion during reboot. This patch adds support for the "auto_poweroff" VM flag to have the script ignore flagged VMs.
NOTE: This is NOT needed for XenServer 6.2.0 since the patch was added into the upstream.
wget -O /opt/xensource/libexec/shutdown.patch http://download.locatrix.com/pacemaker/shutdown.patch
cd /opt/xensource/libexec
cp shutdown shutdown.orig
patch -p1 < shutdown.patch
NOTE: uname -a on your XenServer must return exactly "2.6.32.43-0.4.1.xs1.8.0.835.170778xen" to use our DRBD RPMs. This is the original XS 6.2 from the install CD. If you've installed hotfixes it may have updated your kernel version and you'll need to compile DRBD yourself.
wget http://download.locatrix.com/drbd/xenserver6.2.0/8.4.3/drbd-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.2.0/8.4.3/drbd-bash-completion-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.2.0/8.4.3/drbd-debuginfo-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.2.0/8.4.3/drbd-heartbeat-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.2.0/8.4.3/drbd-km-2.6.32.43_0.4.1.xs1.8.0.835.170778xen-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.2.0/8.4.3/drbd-km-debuginfo-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.2.0/8.4.3/drbd-pacemaker-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.2.0/8.4.3/drbd-udev-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.2.0/8.4.3/drbd-utils-8.4.3-2.i386.rpm
wget http://download.locatrix.com/drbd/xenserver6.2.0/8.4.3/drbd-xen-8.4.3-2.i386.rpm
rpm -i drbd-utils-8.4.3-2.i386.rpm
rpm -i drbd-bash-completion-8.4.3-2.i386.rpm
rpm -i drbd-udev-8.4.3-2.i386.rpm
rpm -i drbd-km-2.6.32.43_0.4.1.xs1.8.0.835.170778xen-8.4.3-2.i386.rpm
rpm -q -a | grep drbd
Download the XenServer DDK:
Now on that VM, here's how to compile DRBD:
Install the build tools
yum --enablerepo=base --disablerepo=citrix install gcc libxslt.i386 docbook-xsl
Download DRBD
mkdir /root/drbd/
cd /root/drbd/
wget http://oss.linbit.com/drbd/8.4/drbd-8.4.3.tar.gz
tar zxvf drbd-8.4.3.tar.gz
cd drbd-8.4.3
I found I had to fix this path for docbook
vi documentation/Makefile.in
STYLESHEET_PREFIX ?= /usr/share/sgml/docbook/xsl-stylesheets-1.69.1-5.1
Build DRBD
./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc --with-km
make km-rpm
make rpm
Create a "xenserver-kernel-ver.txt" to remember what kernel version this build works for
echo "uname -a on your XenServer needs to return exactly this kernel version number:" > xenserver-kernel-ver.txt
echo `uname -a` >> xenserver-kernel-ver.txt
ls /usr/src/redhat/RPMS/i386/drbd*
Now you can copy the RPMs wherever it is that you want (e.g. to the target XenServer hosts)
scp /usr/src/redhat/RPMS/i386/drbd* user@somewhere:.
scp ./xenserver-kernel-ver.txt user@somewhere:.
Install the RPMs on your XenServers
rpm -i drbd-utils-8.4.3-2.i386.rpm
rpm -i drbd-bash-completion-8.4.3-2.i386.rpm
rpm -i drbd-udev-8.4.3-2.i386.rpm
rpm -i drbd-km-XXXX.i386.rpm
rpm -q -a | grep drbd
I found I had DRBD bugs with OVS so I had to do this. drbdadm primary all
kept hanging for me randomly. This changes OVS to the linux network bridge back-end. Hopefully we won't need this some day.
xe-switch-network-backend bridge
reboot
I recommend a direct Gigabit cross-over cable between the 2 servers on a spare NIC. It's not a requirement of course, you just need a connection. Otherwise you can possibly get a split brain situation if the switch power died
Create a network interface for DRBD replication
Open up XenCenter
Click on the server
Select the Networking tab and click the Configure buttom in Management Interfaces
Click Add IP Address
Name: DRBD
IP: 10.0.0.3
netmask: 255.255.255.0
no gateway
OK
Add a hostname and ip address for the x-over network
vi /etc/hosts
10.0.0.3 node1drbd
10.0.0.4 node2drbd
Linbit guide section 3.2 explains how to do a setup when you're using a separate hard drive for DRBD. I wanted to use my existing drives only, so this is what I did below.
Also FYI lvm.conf already has a filter for VG_Xen: `filter = [ "r|/dev/xvd.|", "r|/dev/VG_Xen./*|"]`
So that's why I skipped that step from the Linbit guide.
Note PEs free on both servers, pick a common value
pvdisplay
Create an identical DRBD volume on both servers for storage. This is an example of what I did
lvcreate -l 53760 VG_XenStorage-de2c1846-4bf4-83a8-f74e-0bf1d2f10769 -n drbd
lvdisplay
vi /etc/lvm/lvm.conf
write_cache_state = 0
rm /etc/lvm/cache/.cache
cd /etc/init.d
wget http://download.locatrix.com/drbd/xenserver6.0.2/lvm
chmod 0755 /etc/init.d/lvm
chkconfig --add lvm
chkconfig lvm on
service lvm start
I recommend trying a reboot now and ensure LVM comes up
reboot
Ensure it says available
lvdisplay
Ensure the 'on' keywords below must match what "hostname" returns on the servers
hostname
vi /etc/drbd.d/drbd-sr1.res
resource drbd-sr1 {
protocol C;
on node1.mydomain {
device /dev/drbd1;
disk /dev/VG_XenStorage-de2c1846-4bf4-83a8-f74e-0bf1d2f10769/drbd;
address 10.0.0.3:7789;
meta-disk internal;
}
on node2.mydomain {
device /dev/drbd1;
disk /dev/VG_XenStorage-30746c9f-2d1f-b6d5-3e3e-1eea2fab2fb1/drbd;
address 10.0.0.4:7789;
meta-disk internal;
}
}
vi /etc/drbd.d/global_common.conf
global
{
usage-count yes;
}
common
{
protocol C;
net
{
after-sb-0pri discard-zero-changes;
after-sb-1pri consensus;
after-sb-2pri disconnect;
}
disk
{
}
handlers
{
split-brain "/usr/lib/drbd/notify-split-brain.sh my@email.com";
}
}
On both XenServers execute the following
drbdadm create-md drbd-sr1
modprobe drbd
drbdadm up drbd-sr1
Now ONLY on the XenServer that will be your primary (this will overwrite your secondary server's DRBD data; be careful!)
drbdadm -- --overwrite-data-of-peer primary drbd-sr1
Temporarily make the sync speed 1GB
drbdadm disk-options --resync-rate=1G drbd-sr1
Wait until full sync finishes
cat /proc/drbd
After Full Sync, on Both servers
chkconfig drbd on
Run on primary to create the SR
xe sr-create device-config:device="/dev/drbd1" name-label="DRBD-SR1" type=lvm
Take note of the SR UUID this outputs
Now move DRBD to the secondary so we can introduce the SR there as well
PBDUUID=`xe pbd-list device-config:device=/dev/drbd1 params=uuid | awk -F: '{print $2}' | grep -v '^$' | sed 's/^[ ]//g'`
echo $PBDUUID
xe pbd-unplug uuid=$PBDUUID
drbdadm secondary drbd-sr1
cat /proc/drbd
On the secondary server
drbdadm primary drbd-sr1
cat /proc/drbd
UUID=<SR UUID FROM ABOVE sr-create command>
echo $UUID
vgscan
HOSTNAME=`hostname`
echo $HOSTNAME
HOSTID=`xe host-list params=uuid name-label=$HOSTNAME | awk -F: '{print $2}' | grep -v '^$' | sed 's/^[ ]//g'`
echo $HOSTID
xe sr-introduce uuid=$UUID name-label=DRBD-SR1 type=lvm
xe pbd-create sr-uuid=$UUID host-uuid=$HOSTID device-config:device=/dev/drbd1
Now you can move it back to your primary (same commands below here any time you want to switch it around)
Shutdown any running VMs (there shouldn't be any right now of course, but in the future maybe.
xe vm-list power-state=running
xe vm-shutdown vm=
Unplug the PBD
PBDUUID=`xe pbd-list device-config:device=/dev/drbd1 params=uuid | awk -F: '{print $2}' | grep -v '^$' | sed 's/^[ ]//g'`
echo $PBDUUID
xe pbd-unplug uuid=$PBDUUID
Set DRBD to secondary
drbdadm secondary drbd-sr1
cat /proc/drbd
Run on the other server to make it the primary
drbdadm primary drbd-sr1
cat /proc/drbd
Plug-in the PBD/SR
PBDUUID=`xe pbd-list device-config:device=/dev/drbd1 params=uuid | awk -F: '{print $2}' | grep -v '^$' | sed 's/^[ ]//g'`
echo $PBDUUID
SRUUID=`xe pbd-list device-config:device=/dev/drbd1 params=sr-uuid | awk -F: '{print $2}' | grep -v '^$' | sed 's/^[ ]//g'`
echo $SRUUID
xe pbd-plug uuid=$PBDUUID
We use Nagios on both servers to monitor DRBD, this is how we setup the NRPE plug-in
wget --no-check-certificate http://raw.github.com/anchor/nagios-plugin-drbd/master/check_drbd -O /usr/lib/nagios/plugins/check_drbd
chmod +x /usr/lib/nagios/plugins/check_drbd
vi /etc/nagios/nrpe.cfg
#
command[check_drbd]=/usr/lib/nagios/plugins/check_drbd
#
service nrpe restart
NOTE: I wrote a wiki article a while ago explaining how to install NRPE on XCP or XenServer (same thing): http://wiki.xen.org/wiki/NagiosXCP
rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
wget -O /etc/yum.repos.d/pacemaker.repo http://clusterlabs.org/rpm/epel-5/clusterlabs.repo
sed -i -e "s/enabled=0/enabled=1/" /etc/yum.repos.d/CentOS-Base.repo
sed -i -e "s/enabled=1/enabled=0/" /etc/yum.repos.d/Citrix.repo
yum install -y pacemaker corosync heartbeat
chkconfig drbd off
wget -O /usr/bin/timeout http://www.bashcookbook.com/bashinfo/source/bash-4.0/examples/scripts/timeout3
chmod +x /usr/bin/timeout
mkdir /usr/lib/ocf/resource.d/locatrix
wget -O/usr/lib/ocf/resource.d/locatrix/XenServerPBD https://raw.github.com/locatrix/xs-pacemaker/master/XenServerPBD
chmod 755 /usr/lib/ocf/resource.d/locatrix/XenServerPBD
wget -O/usr/lib/ocf/resource.d/locatrix/XenServerVM https://raw.github.com/locatrix/xs-pacemaker/master/XenServerVM
chmod 755 /usr/lib/ocf/resource.d/locatrix/XenServerVM
Installs:
This must be set on every HA VM to ensure that shutdown is handled by Pacemaker instead of the system
xe vm-param-set other-config:auto_poweroff=false uuid=$UUID
Initial Corosync setup for both nodes
corosync-keygen
chown root:root /etc/corosync/authkey
chmod 400 /etc/corosync/authkey
scp /etc/corosync/authkey root@node2.mydomain:/etc/corosync/authkey
Create the Corosync config on both nodes
vi /etc/corosync/corosync.conf
totem {
version: 2
token: 5000
token_retransmits_before_loss_const: 20
join: 1000
consensus: 7500
vsftype: none
max_messages: 20
secauth: off
threads: 0
clear_node_high_bit: yes
interface {
ringnumber: 0
# changethis!
bindnetaddr: 10.0.0.3
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
logging {
fileline: off
to_syslog: yes
to_stderr: no
syslog_facility: daemon
debug: on
timestamp: on
}
amf {
mode: disabled
}
Set corosync to start at boot
chkconfig --level 35 corosync on
/etc/init.d/corosync start
Configure DRBD for Pacemaker
vi /etc/drbd.d/global_common.conf
#http://www.drbd.org/users-guide/s-pacemaker-fencing.html
resource <resource> {
disk {
fencing resource-only;
...
}
handlers {
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
...
}
...
}
Copy the configuration to the other server
scp /etc/drbd.d/global_common.conf root@pmx02.eightmile.locatrix.net:/etc/drbd.d/global_common.conf
Live-update DRBD
drbdadm adjust all
Note: The Linbit DRBD agent used below is specifically designed for Master/Slave DRBD setups, that's why this works
Setup the DRBD agent
crm configure property stonith-enabled="false"
crm configure property no-quorum-policy=ignore
crm configure rsc_defaults resource-stickiness=100
crm
configure
primitive drbd0 ocf:linbit:drbd \
params drbd_resource=drbd-sr1 \
op monitor interval="15" role="Master" \
op monitor interval="30" role="Slave" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="100s"
ms ms-drbd0 drbd0 \
meta master-max="1" master-node-max="1" clone-max="2" \
clone-node-max="1" notify="true"
commit
quit
If you want to prefer node 1 when possible add this line:
crm configure location ms-drbd0-prefer-node1 ms-drbd0 rule 100: node1.mydomain
Setup the XenServerPBD agent to handle SR/PBD switch-over
crm configure
primitive xs_pbd ocf:locatrix:XenServerPBD params pbd_device="/dev/drbd1" debug="true" \
op start interval="0s" timeout="60s" \
op stop interval="0s" timeout="40s"
colocation fs_on_drbd inf: xs_pbd ms-drbd0:Master
order fs_after_drbd inf: ms-drbd0:promote xs_pbd:start
commit
bye
Setup the VM agent for a specific VM named "dbm" in this example. Note: Yes, it's expected you'd do this individually for each VM you want controlled. The agent will automatically start/stop the VM. You have to shutdown VMs via Pacemaker, otherwise it'll just go and restart it again (and probably irritate you a lot at the time until you remember this).
crm configure
primitive xs_vm_dbm ocf:locatrix:XenServerVM params vm_name="dbm" debug="true" \
op monitor interval="10s" timeout="30s" \
op start interval="0s" timeout="120s" \
op stop interval="0s" timeout="240s"
colocation xs_vm_dbm-with-xs_pbd inf: xs_vm_dbm xs_pbd
order xs_vm_dbm-after-xs_pbd inf: xs_pbd:start xs_vm_dbm:start
commit
bye
You can shutdown a VM like this:
Get the name of the VM resource to shutdown
crm resource status
Now turn it off. This WILL shutdown the VM. It helps if you have the XenServer tools installed on the VM which makes graceful shutdown easier.
crm resource stop xs_vm_dbm
Start it up again
crm resource start xs_vm_dbm
As you're doing the above, you may notice that when you create a VM intially on your primary server, it doesn't magically appear in the VM list on your secondary server even when the SR is plugged in. This is because the VM configuration meta-data (meaning it's name, disk name, CPU, memory, etc, etc) is all stored locally on the server. Sorry, it doesn't get copied automatically to the other server, so we have to handle that.
The lowest risk way to handle this at the moment is to manually copy the meta-data whenever you first create the VM or make a change to the config (e.g. increase the memory size).
Backup the VM meta-data to a file, then copy the VM config to the secondary
VMNAME=MYVM
SECONDARY=node2.mydomain
SEC_PWD=
VMUUID=`xe vm-list name-label=$VMNAME params=uuid | awk -F: '{print $2}' | grep -v '^$' | sed 's/^[ ]//g'`
xe vm-export vm=$VMUUID filename=./$VMNAME-metadata metadata=true
ls -la ./$VMNAME-metadata
SEC_SRUUID=`xe -s $SECONDARY -u root -pw $SEC_PWD sr-list name-label=DRBD-SR1 params=uuid | awk -F: '{print $2}' | grep -v '^$' | sed 's/^[ ]//g'`
echo $SEC_SRUUID
xe -s $SECONDARY -u root -pw $SEC_PWD vm-import filename=./$VMNAME-metadata sr-uuid=$SEC_SRUUID --metadata --preserve --force
OPTIONAL: Just in case for fail-over I usually backup all of the meta-data to the SR using the included script
SRUUID=`xe sr-list name-label=DRBD-SR1 params=uuid | awk -F: '{print $2}' | grep -v '^$' | sed 's/^[ ]//g'`
echo $SRUUID
xe-backup-metadata -c -u $SRUUID
There's an accompanying script for restore that will restore all the VMs from the SR backup
xe-restore-metadata
These scripts are mostly handy in case you want to copy all the VM config data from one server to another or in case of manually handling a restore.
The XenServerPBD agent actually includes an option to allow automatic use of xe-backup-metadata and xe-restore-metadata, however, I consider it to be potentially risky to automate those scripts since they copy ALL the VMs not just the single one you want. In the next version of the agent I plan to remove it and instead implement a process based on the above meta-data import/export.
Be aware - if you upgrade XenServer or install a hotfix that updates the kernel (and a number of them do) it will break DRBD and you'll need to re-install. In particular if you perform an upgrade it'll basically wipe out all of the customizations you've done here and you'll need to re-do it. This isn't the end of the world - it just means you need to do one node at a time and rebuild it, re-install DRBD, re-sync it, then do the same to the next node. It's just time consuming. In my experience XenServer upgrades and hotfixes will not actually affect the LVM data and your DRBD volume - but I suppose I can't guarantee that.
crm resource migrate ms-drbd0 node2.mydomain
crm resource unmigrate ms-drbd0
Stop drbd first
drbdadm disconnect all
drbdadm down all
/etc/init.d/drbd stop
Now delete the resources
crm resource stop ms-drbd0
crm resource cleanup ms-drbd0
crm configure delete ms-drbd0
crm resource stop xs_pbd
crm resource cleanup xs_pbd
crm configure delete xs_pbd
crm resource stop xs_vm
crm resource cleanup xs_vm
crm configure delete xs_vm
Copyright (c) 2013, Locatrix Communications
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of Locatrix Communications nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.