Closed bootc closed 2 months ago
This is caused by IPaddr2 not using the metric in older versions, so you'll either have to remove the duplicate (with other metric) route for the subnet on the nodes, or reboot to avoid this issue.
No, it's not as simple as that. I rebooted one of my nodes as you suggested and attempted to bring up the resource with resource-agents 4.15.1 and got the same problem:
2024-08-14T09:19:00.821227+01:00 hicks IPaddr2(p_vip_resolv2_ip6)[5799]: ERROR: More than 1 routes match 2001:db8::1/128. Unable to decide which route to use.
2024-08-14T09:19:00.824158+01:00 hicks IPaddr2(p_vip_resolv2_ip6)[5799]: WARNING: [findif] failed
Oh. This is on the loopback device. Maybe there's some special case there.
Can you post the output from pcs resource debug-start --full p_vip_hafw_ip6
?
So it seems the start process runs fine, it's the monitor operation that fails.
Once the resource has started, a pcs resource debug-monitor --full p_vip_hafw_ip6
outputs:
(unpack_config) warning: Blind faith: not fencing unseen nodes
Operation force-check for p_vip_hafw_ip6 (ocf:heartbeat:IPaddr2) returned 7 (not running: More than 1 routes match 2001:db8::3c/128. Unable to decide which route to use.)
+ echo
+ + printenv
sort
+ env=
GCC_COLORS=error=01;31:warning=01;35:note=01;36:caret=01;32:locus=01:quote=01
HA_debug=1
HA_logfacility=none
HOME=/root
LC_ALL=C
LOGNAME=root
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=00:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.
deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.avif=01;35:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2
v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:
*.spx=00;36:*.xspf=00;36:*~=00;90:*#=00;90:*.bak=00;90:*.crdownload=00;90:*.dpkg-dist=00;90:*.dpkg-new=00;90:*.dpkg-old=00;90:*.dpkg-tmp=00;90:*.old=00;90:*.orig=00;90:*.part=00;90:*.rej=00;90:*.rpmnew=00;90:*.rpmorig=00;90:*.rpmsave=00;90:*.swp=00;90:*.tmp=00;90:*.ucf-dist=00;90:*.ucf-new=00;90:*.ucf-old=00;90:
MAIL=/var/mail/root
OCF_EXIT_REASON_PREFIX=ocf-exit-reason:
OCF_OUTPUT_FORMAT=text
OCF_RA_VERSION_MAJOR=1
OCF_RA_VERSION_MINOR=1
OCF_RESKEY_CRM_meta_class=ocf
OCF_RESKEY_CRM_meta_id=p_vip_hafw_ip6
OCF_RESKEY_CRM_meta_provider=heartbeat
OCF_RESKEY_CRM_meta_resource_stickiness=100
OCF_RESKEY_CRM_meta_target_role=Started
OCF_RESKEY_CRM_meta_timeout=100000
OCF_RESKEY_CRM_meta_type=IPaddr2
OCF_RESKEY_cidr_netmask=128
OCF_RESKEY_crm_feature_set=3.19.5
OCF_RESKEY_ip=2001:db8::3c
OCF_RESKEY_nic=lo
OCF_RESOURCE_INSTANCE=p_vip_hafw_ip6
OCF_RESOURCE_PROVIDER=heartbeat
OCF_RESOURCE_TYPE=IPaddr2
OCF_ROOT=/usr/lib/ocf
OCF_TRACE_FILE=/dev/stderr
OCF_TRACE_RA=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/puppetlabs/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/ucb
PCMK_debug=1
PCMK_logfacility=none
PCMK_service=crm_resource
PWD=/home/bootc
SHELL=/bin/bash
SHLVL=1
SSH_AUTH_SOCK=/tmp/ssh-bootc/agent
SSH_CLIENT=2001:db8:60:a421:35f7:212f:641e 55057 22
SUDO_COMMAND=/bin/bash
SUDO_GID=1000
SUDO_UID=1000
SUDO_USER=bootc
TERM=screen-256color
USER=root
_=/usr/sbin/pcs
__OCF_TRC_DEST=/dev/stderr
__OCF_TRC_MANAGE=
+ ocf_is_true
+ false
+ . /usr/lib/ocf/lib/heartbeat/findif.sh
+ OCF_RESKEY_ip_default=
+ OCF_RESKEY_cidr_netmask_default=
+ OCF_RESKEY_broadcast_default=
+ OCF_RESKEY_iflabel_default=
+ OCF_RESKEY_cidr_netmask_default=
+ OCF_RESKEY_lvs_support_default=false
+ OCF_RESKEY_lvs_ipv6_addrlabel_default=true
+ OCF_RESKEY_lvs_ipv6_addrlabel_value_default=99
+ OCF_RESKEY_clusterip_hash_default=sourceip-sourceport
+ OCF_RESKEY_mac_default=
+ OCF_RESKEY_unique_clone_address_default=false
+ OCF_RESKEY_arp_interval_default=200
+ OCF_RESKEY_arp_count_default=5
+ OCF_RESKEY_arp_count_refresh_default=0
+ OCF_RESKEY_arp_bg_default=
+ OCF_RESKEY_arp_sender_default=
+ OCF_RESKEY_send_arp_opts_default=
+ OCF_RESKEY_flush_routes_default=false
+ OCF_RESKEY_run_arping_default=false
+ OCF_RESKEY_nodad_default=false
+ OCF_RESKEY_noprefixroute_default=false
+ OCF_RESKEY_preferred_lft_default=forever
+ OCF_RESKEY_network_namespace_default=
+ : 2001:db8::3c
+ : 128
+ :
+ :
+ : false
+ : true
+ : 99
+ : sourceip-sourceport
+ :
+ : false
+ : 200
+ : 5
+ : 0
+ :
+ :
+ :
+ : false
+ : false
+ : false
+ : false
+ : forever
+ :
+ SENDARP=/usr/libexec/heartbeat/send_arp
+ SENDUA=/usr/libexec/heartbeat/send_ua
+ FINDIF=findif
+ VLDIR=/run/resource-agents
+ SENDARPPIDDIR=/run/resource-agents
+ CIP_lockfile=/run/resource-agents/IPaddr2-CIP-2001:db8::3c
+ IPADDR2_CIP_IPTABLES=iptables
+ ocf_is_true false
+ false
+ ip_validate
+ check_binary ip
+ have_binary ip
+ [ = 1 ]
+ echo ip
+ sed -e s/ -.*//
+ local bin=ip
+ which ip
+ test -x /usr/bin/ip
+ IP_CIP=
+ [ -n ]
+ ip_init
+ local rc
+ uname -s
+ [ XLinux != XLinux ]
+ [ X2001:db8::3c = X ]
+ true
+ : YAY!
+ BASEIP=2001:db8::3c
+ BRDCAST=
+ NIC=lo
+ [ ! -z -a -z 128 ]
+ NETMASK=128
+ IFLABEL=
+ IF_MAC=
+ IP_INC_GLOBAL=1
+ expr 0 + 1
+ IP_INC_NO=1
+ ocf_is_true false
+ false
+ ocf_is_decimal 1
+ true
+ [ 1 -gt 0 ]
+ :
+ echo 2001:db8::3c
+ grep -qs :
+ [ 0 -ne 0 ]
+ FAMILY=inet6
+ ip route get 2001:db8::3c
+ awk $1~/:/ {print $1} $2~/:/ {print $2}
+ SANITIZED_IP=2001:db8::3c
+ [ -n 2001:db8::3c ]
+ OCF_RESKEY_ip=2001:db8::3c
+ ocf_is_true false
+ false
+ ocf_is_true true
+ true
+ ocf_is_decimal 99
+ true
+ [ 99 -ge 0 ]
+ :
+ [ -z ]
+ OCF_RESKEY_arp_bg=false
+ findif
+ local match=2001:db8::3c
+ local family
+ local proto
+ local scope
+ local nic=lo
+ local netmask=128
+ local brdcast=
+ local metric
+ local routematch
+ echo 2001:db8::3c
+ grep -qs :
+ [ 0 = 0 ]
+ family=inet6
+ findif_check_params inet6
+ local family=inet6
+ local match=2001:db8::3c
+ local nic=lo
+ netmask=128
+ local brdcast=
+ local errmsg
+ maybe_convert_dotted_quad_to_cidr
+ return
+ return 0
+ [ -n 128 ]
+ match=2001:db8::3c/128
+ [ -n lo ]
+ ip -o -f inet6 route list match 2001:db8::3c/128
+ grep dev lo
+ sed -e s,^\([0-9.]\+\) ,\1/32 ,;s,^\([0-9a-f:]\+\) ,\1/128 ,
+ sort -t/ -k2,2nr
+ routematch=2001:db8::3c/128 dev lo proto kernel metric 32 pref medium
unreachable 2001:db8::/52 dev lo proto bird metric 32 pref medium
+ [ inet6 = inet6 ]
+ echo 2001:db8::3c/128 dev lo proto kernel metric 32 pref medium
unreachable 2001:db8::/52 dev lo proto bird metric 32 pref medium
+ grep -v ^default
+ routematch=2001:db8::3c/128 dev lo proto kernel metric 32 pref medium
unreachable 2001:db8::/52 dev lo proto bird metric 32 pref medium
+ echo 2001:db8::3c/128 dev lo proto kernel metric 32 pref medium
unreachable 2001:db8::/52 dev lo proto bird metric 32 pref medium
+ wc -l
+ [ 2 -gt 1 ]
+ ocf_exit_reason More than 1 routes match 2001:db8::3c/128. Unable to decide which route to use.
+ local cookie=ocf-exit-reason:
+ local fmt
+ local msg
+ fmt=%s
+ [ -z ocf-exit-reason: ]
+ printf %s More than 1 routes match 2001:db8::3c/128. Unable to decide which route to use.
+ msg=More than 1 routes match 2001:db8::3c/128. Unable to decide which route to use.
+ printf %s%s\n ocf-exit-reason: More than 1 routes match 2001:db8::3c/128. Unable to decide which route to use.
ocf-exit-reason:More than 1 routes match 2001:db8::3c/128. Unable to decide which route to use.
+ __ha_log --ignore-stderr ERROR: More than 1 routes match 2001:db8::3c/128. Unable to decide which route to use.
+ local ignore_stderr=false
+ local loglevel
+ [ x--ignore-stderr = x--ignore-stderr ]
+ ignore_stderr=true
+ shift
+ [ none = ]
+ tty
+ set_logtag
+ [ -z ]
+ [ -n p_vip_hafw_ip6 ]
+ HA_LOGTAG=IPaddr2(p_vip_hafw_ip6)[24234]
+ [ x = xyes ]
+ [ -n ]
+ [ -n ]
+ [ -z -a -z ]
+ [ true = true ]
+ [ -n /dev/null ]
+ : appending to /dev/null
+ [ x != /dev/nullx ]
+ hadate
+ date +%b %d %T
+ echo IPaddr2(p_vip_hafw_ip6)[24234]: Aug 14 09:41:01 ERROR: More than 1 routes match 2001:db8::3c/128. Unable to decide which route to use.
+ return 1
+ NICINFO=
+ rc=1
+ [ 1 -eq 0 ]
+ ocf_is_probe
+ [ monitor = monitor -a 0 = 0 ]crm_resource: Error performing operation: Not running
+ ocf_log info [findif] failed
+ [ 2 -lt 2 ]
+ __OCF_PRIO=info
+ shift
+ __OCF_MSG=[findif] failed
+ __OCF_PRIO=INFO
+ [ INFO = DEBUG ]
+ ha_log INFO: [findif] failed
+ __ha_log INFO: [findif] failed
+ local ignore_stderr=false
+ local loglevel
+ [ xINFO: [findif] failed = x--ignore-stderr ]
+ [ none = ]
+ tty
+ set_logtag
+ [ -z ]
+ [ -n p_vip_hafw_ip6 ]
+ HA_LOGTAG=IPaddr2(p_vip_hafw_ip6)[24234]
+ [ x = xyes ]
+ [ -n ]
+ [ -n ]
+ [ -z -a -z ]
+ [ false = true ]
+ : appending to stderr
+ hadate
+ date +%b %d %T
+ echo Aug 14 09:41:01 INFO: [findif] failed
Aug 14 09:41:01 INFO: [findif] failed
+ [ -n /dev/null ]
+ : appending to /dev/null
+ [ x != /dev/nullx ]
+ hadate
+ date +%b %d %T
+ echo IPaddr2(p_vip_hafw_ip6)[24234]: Aug 14 09:41:01 INFO: [findif] failed
+ exit 7
NB: I have substituted my real prefix with 2001:db8::.
+ routematch=2001:db8::3c/128 dev lo proto kernel metric 32 pref medium
unreachable 2001:db8::/52 dev lo proto bird metric 32 pref medium
I tried creating a resource like yours, but maybe you have some additional route being setup via ifup on Debian? At least you got an unreachable route which seems to cause the issue in this case.
Right, I add an unreachable covering route for my prefix; this is a router. You should be able to replicate this using something like:
ip route add unreachable 2001:db8::/52
Edit to add: the route is being added by BIRD using a configuration like:
protocol static {
route 2001:db8::/52 unreachable;
}
I've made a patch to ignore unreachable routes: https://github.com/ClusterLabs/resource-agents/pull/1965
In my setup, the following Pacemaker resource (using crmsh syntax) breaks:
I suspect what is triggering this breakage is a covering unreachable route for the prefix, e.g.:
My syslog contains messages such as:
Downgrading from 4.15.1 to 4.14.0 resolves the issue.