ZFS promotion not working

san4ez1008 commented 1 year ago

Hi! I'm build a cluster on Ubuntu 22.04 pacemaker+corosync. VIP and zfs-pool1 resources created, zfs resource declared promotable After starting the resource in the error logs:

мар 22 16:51:05 vnas-centos-2 pacemaker-schedulerd[1194]: warning: zfs-pool1-clone cannot run on vnas-centos-1 due to reaching migration threshold (clean up resource to allow again) мар 22 16:51:05 vnas-centos-2 pacemaker-schedulerd[1194]: warning: zfs-pool1-clone cannot run on vnas-centos-1 due to reaching migration threshold (clean up resource to allow again)

Пока не выполнишь команды zpool import pool1 и pcs resource cleanup, ресурс не запускается, но при выводе команды pcs resource status отображается как unpromotable.

[root@vnas-centos-1 labadmin]# pcs resource debug-promote zfs-pool1 --full (unpack_rsc_op_failure) warning: Unexpected result (error) was recorded for start of zfs-pool1:0 on vnas-centos-1 at Mar 22 15:19:53 2023 | rc=1 id=zfs-pool1_last_failure_0 (unpack_rsc_op_failure) warning: Unexpected result (error) was recorded for start of zfs-pool1:0 on vnas-centos-2 at Mar 22 15:19:53 2023 | rc=1 id=zfs-pool1_last_failure_0 Operation force-promote for zfs-pool1 (ocf:heartbeat:ZFS) returned 3 (unimplemented feature) +++ 15:20:21: ocf_start_trace:991: echo +++ 15:20:21: ocf_start_trace:991: sort +++ 15:20:21: ocf_start_trace:991: printenv ++ 15:20:21: ocf_start_trace:991: env=' /usr/bin/scl "$@"; _mlstatus=$?; else eval "module $@"; eval ${which_declare} ) | /usr/bin/which --tty-only --read-alias --read-functions --show-tilde --show-dot $@ fi return $_mlstatus BASH_FUNC__module_raw%%=() { eval/usr/bin/tclsh /usr/share/Modules/libexec/modulecmd.tcl bash "$@"`; BASH_FUNC_ml%%=() { module ml "$@" BASH_FUNC_module%%=() { _module_raw "$@" 2>&1 BASH_FUNC_scl%%=() { if [ "$1" = "load" -o "$1" = "unload" ]; then BASH_FUNC_which%%=() { ( alias; DEBUGINFOD_URLS=https://debuginfod.centos.org/ HA_debug=1 HA_logfacility=none HISTSIZE=1000 HOME=/root HOSTNAME=vnas-centos-1 LC_ALL=C LESSOPEN=||/usr/bin/lesspipe.sh %s LOADEDMODULES= LOGNAME=root LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=01;36:.au=01;36:.flac=01;36:.m4a=01;36:.mid=01;36:.midi=01;36:.mka=01;36:.mp3=01;36:.mpc=01;36:.ogg=01;36:.ra=01;36:.wav=01;36:.oga=01;36:.opus=01;36:.spx=01;36:.xspf=01;36: MAIL=/var/spool/mail/labadmin MANPATH=/usr/share/man: MODULEPATH=/etc/scl/modulefiles:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles MODULESHOME=/usr/share/Modules MODULES_CMD=/usr/share/Modules/libexec/modulecmd.tcl MODULES_RUN_QUARANTINE=LD_LIBRARY_PATH LD_PRELOAD OCF_EXIT_REASON_PREFIX=ocf-exit-reason: OCF_OUTPUT_FORMAT=text OCF_RA_VERSION_MAJOR=1 OCF_RA_VERSION_MINOR=1 OCF_RESKEY_CRM_meta_class=ocf OCF_RESKEY_CRM_meta_clone=0 OCF_RESKEY_CRM_meta_globally_unique=false OCF_RESKEY_CRM_meta_id=zfs-pool1 OCF_RESKEY_CRM_meta_migration_threshold=1 OCF_RESKEY_CRM_meta_notify=true OCF_RESKEY_CRM_meta_promotable=true OCF_RESKEY_CRM_meta_promoted_max=1 OCF_RESKEY_CRM_meta_promoted_node_max=1 OCF_RESKEY_CRM_meta_provider=heartbeat OCF_RESKEY_CRM_meta_resource_stickiness=100 OCF_RESKEY_CRM_meta_timeout=20000 OCF_RESKEY_CRM_meta_type=ZFS OCF_RESKEY_crm_feature_set=3.16.2 OCF_RESKEY_importargs=-d /dev/mapper/ OCF_RESKEY_pool=pool1 OCF_RESOURCE_INSTANCE=zfs-pool1 OCF_RESOURCE_PROVIDER=heartbeat OCF_RESOURCE_TYPE=ZFS OCF_ROOT=/usr/lib/ocf OCF_TRACE_FILE=/dev/stderr OCF_TRACE_RA=1 PATH=/root/.local/bin:/root/bin:/usr/share/Modules/bin:/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/ucb PCMK_logfacility=none PCMK_service=crm_resource PWD=/home/labadmin SHELL=/bin/bash SHLVL=1 SUDO_COMMAND=/bin/bash SUDO_GID=1000 SUDO_UID=1000 SUDO_USER=labadmin SCOLORS=auto TERM=xterm USER=root =/bin/printenv OCF_TRC_DEST=/dev/stderr OCF_TRC_MANAGE= which_declare=declare -f } } } } }' ++ 15:20:21: 1045: ocf_is_true '' ++ 15:20:21: ocf_is_true:105: case "$1" in ++ 15:20:21: ocf_is_true:107: facrm_resource: Error performing operation: Unimplemented lse

15:20:21: 23: : /usr/lib/ocf/lib/heartbeat/helpers
15:20:21: 28: DEBUGLOG=/var/log/ZFS_cluster_debug.log
15:20:21: 30: USAGE='usage: /usr/lib/ocf/resource.d/heartbeat/ZFS {start|stop|status|monitor|validate-all|meta-data}'
15:20:21: 197: '[' 1 -ne 1 ']'
15:20:21: 201: case $1 in
15:20:21: 208: usage 3
15:20:21: usage:193: echo usage: /usr/lib/ocf/resource.d/heartbeat/ZFS '{start|stop|status|monitor|validate-all|meta-data}' usage: /usr/lib/ocf/resource.d/heartbeat/ZFS {start|stop|status|monitor|validate-all|meta-data}
15:20:21: usage:194: return 3
15:20:21: 211: exit 3 `

Resource config: [root@vnas-centos-1 labadmin]# pcs resource config Group: group-pool1 Resource: ip-pool1 (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip-pool1-instance_attributes cidr_netmask=25 ip=10.0.3.90 Operations: monitor: ip-pool1-monitor-interval-10s interval=10s timeout=20s start: ip-pool1-start-interval-0s interval=0s timeout=20s stop: ip-pool1-stop-interval-0s interval=0s timeout=20s Resource: scsi-pool1 (class=ocf provider=heartbeat type=IPaddr2) Attributes: scsi-pool1-instance_attributes cidr_netmask=24 ip=192.168.90.50 Operations: monitor: scsi-pool1-monitor-interval-10s interval=10s timeout=20s start: scsi-pool1-start-interval-0s interval=0s timeout=20s stop: scsi-pool1-stop-interval-0s interval=0s timeout=20s Resource: lun1-pool1 (class=ocf provider=heartbeat type=iSCSILogicalUnit) Attributes: lun1-pool1-instance_attributes implementation=lio-t lun=0 path=/dev/pool1/vol1 target_iqn=iqn.2004-10.com.ubuntu:01:84de25ddfc37 Operations: monitor: lun1-pool1-monitor-interval-10s interval=10s timeout=10s start: lun1-pool1-start-interval-0s interval=0s timeout=10s stop: lun1-pool1-stop-interval-0s interval=0s timeout=10s Clone: target-pool1-clone Meta Attributes: target-pool1-clone-meta_attributes clone-max=2 clone-node-max=1 Resource: target-pool1 (class=ocf provider=heartbeat type=iSCSITarget) Attributes: target-pool1-instance_attributes allowed_initiators="iqn.1998-01.com.vmware:esxi2.lab.local:524684951:64 iqn.1998-01.com.vmware:esxi2.lab.local:1776083083:64" implementation=lio-t iqn=iqn.2004-10.com.ubuntu:01:84de25ddfc37 portals=192.168.90.50 Operations: monitor: target-pool1-monitor-interval-10s interval=10s timeout=10s start: target-pool1-start-interval-0s interval=0s timeout=10s stop: target-pool1-stop-interval-0s interval=0s timeout=10s Clone: zfs-pool1-clone Meta Attributes: zfs-pool1-clone-meta_attributes clone-max=2 clone-node-max=1 promotable=true Resource: zfs-pool1 (class=ocf provider=heartbeat type=ZFS) Attributes: zfs-pool1-instance_attributes importargs="-d /dev/mapper/" pool=pool1 Operations: monitor: zfs-pool1-monitor-interval-30 interval=30 role=Promoted monitor: zfs-pool1-monitor-interval-10 interval=10 role=Unpromoted start: zfs-pool1-start-interval-0s interval=0s timeout=60s stop: zfs-pool1-stop-interval-0s interval=0s timeout=60s

oalbrigt commented 1 year ago

ZFS is not a promotable agent. You can run pcs resource describe <agent> and look for promote/demote actions to check whether an agent is promotable or not.

san4ez1008 commented 1 year ago

Oh, thank you. But I have a question..

`[root@vnas-centos-1 labadmin]# pcs resource describe iSCSITarget | grep promote

Assumed agent name 'ocf:heartbeat:iSCSITarget' (deduced from 'iSCSITarget')

[root@vnas-centos-1 labadmin]# pcs resource describe iSCSITarget | grep clone

Assumed agent name 'ocf:heartbeat:iSCSITarget' (deduced from 'iSCSITarget') `

`[root@vnas-centos-1 labadmin]# pcs resource status

Resource Group: group-pool1:
- zfs-pool1 (ocf:heartbeat:ZFS): Started vnas-centos-1
- scsi-pool1 (ocf:heartbeat:IPaddr2): Started vnas-centos-1
- lun1-pool1 (ocf:heartbeat:iSCSILogicalUnit): Started vnas-centos-1
Clone Set: target-pool1-clone [target-pool1]:
- Started: [ vnas-centos-1 vnas-centos-2 ] `

How the f**k?

oalbrigt commented 1 year ago

All agents can be cloned (unless they contain logic that makes it fail when you create/start it due to possible data corruption in some special cases).

san4ez1008 commented 1 year ago

OK. Then one more question, however, not quite on the topic. Cluster assembled, zfs, VIP, iSCSITarget and iSCSILogicalUnit resource configured. The LUN is given away in VMware, but when the resources move, the connection with the LUN is lost. There are no such problems in the Windows cluster. Where could I be wrong?

nrwahl2 commented 1 year ago

@san4ez1008 I'd suggest joining the users@clusterlabs.org mailing list and emailing the list for that question. We'd need more details to have any idea, and it may not be a resource agent issue.

For one thing, we'd want to see your cluster configuration. The grouping or the ordering/colocation constraints matter. If you have a resource group like

zfs
VIP
iSCSITarget
iSCSILogicalUnit

then when the resources move from node 1 to node 2, you have the following sequence of events:

stop iSCSILogicalUnit on node 1
stop iSCSITarget on node 1
stop VIP on node 1
stop zfs on node 1
start zfs on node 2
start VIP on node 2
start iSCSITarget on node 2
start iSCSILogicalUnit on node 2

Note that there is a period during which the LUN is down, as well as a period during which the VIP is down. I'm not familiar with Windows clusters, so I don't know how they manage this situation. I haven't configured iSCSI in a while, but I think there's a way to set up automatic retries on the client.

san4ez1008 commented 1 year ago

@nrwahl2 The cluster was assembled according to the following manual: https://netbergtw.com/top-support/articles/zfs-cib/ Yes, I understand that there is a certain period of time during which VIP and LUNs will be unavailable, but this is critical for VMware, even a 2-4 second downtime will lead to the failure of virtual machines. There shouldn't be any downtime.

nrwahl2 commented 1 year ago

@san4ez1008 I don't know how you would avoid downtime when the VIP and LUNs are unavailable. They have to be stopped before they can safely be started on another node. If the duration of the downtime is the issue, that's going to depend on how long commands underlying the resource agents take (e.g., zpool (export|import), the (ietadm|tgtadm|targetcli|<whatever> commands, etc.). I'd suggest checking pacemaker.log during a failover to see how long each resource operation is taking.

If the resource operations and the whole failover are completing very quickly, then it could be that the network has cached the MAC address or something for the virtual IP address and is temporarily routing traffic to the wrong place, until it refreshes. That might be tunable on the network side.

nrwahl2 commented 1 year ago

again I suggest emailing the mailing list

san4ez1008 commented 1 year ago

@nrwahl2 I have a log file at the time of switching resources, but not the pacemaker.log, jounalctl. https://pastebin.com/eLj8DdtY At the moment I'm testing a drbd resource, but the problem persists the same. What if I use cloned resources for VIP, Target and LUN? Is it possible?

nrwahl2 commented 1 year ago

Cloned IPaddr2 resources may not work properly on modern systems: https://bugs.clusterlabs.org/show_bug.cgi?id=5513#c1

Not sure whether it's safe to clone iSCSITarget resources and iSCSILogicalUnit resources or not. I'm almost certain it's never been tested.

I could envision corruption happening if clients write to the LUN via both server nodes. It might be work if you can guarantee that when one node is serving the LUN, nothing writes to the LUN or the underlying storage on the other node (either local to the node, or from a client). Then it would just be a matter of the IPaddr2 resource failover.

ClusterLabs / resource-agents

ZFS promotion not working #1852