ClusterLabs / resource-agents

Combined repository of OCF agents from the RHCS and Linux-HA projects
GNU General Public License v2.0
493 stars 579 forks source link

ZFS can't migrate to other node (cannot open pool: no such pool) #1856

Open san4ez1008 opened 1 year ago

san4ez1008 commented 1 year ago

Cluster pacemaker+corosync, Centos 9 Stream VMs with quorum. [root@vnas-centos-1 labadmin]# pcs resource config Group: group-pool1 Resource: zfs-pool1 (class=ocf provider=heartbeat type=ZFS) Attributes: zfs-pool1-instance_attributes importargs="-d /dev/disk/by-id/" pool=pool0 Operations: monitor: zfs-pool1-monitor-interval-5s interval=5s timeout=30s start: zfs-pool1-start-interval-0s interval=0s timeout=15s stop: zfs-pool1-stop-interval-0s interval=0s timeout=15s Resource: vip-scsi (class=ocf provider=heartbeat type=IPaddr2) Attributes: vip-scsi-instance_attributes cidr_netmask=24 ip=192.168.90.50 Operations: monitor: vip-scsi-monitor-interval-10s interval=10s timeout=20s start: vip-scsi-start-interval-0s interval=0s timeout=20s stop: vip-scsi-stop-interval-0s interval=0s timeout=20s Resource: target-pool1 (class=ocf provider=heartbeat type=iSCSITarget) Attributes: target-pool1-instance_attributes additional_parameters="DefaultTime2Retain=0 DefaultTime2Wait=0" implementation=lio-t iqn=iqn.2004-10.com.ubuntu:01:84de25ddfc37 portals=192.168.90.50 Operations: monitor: target-pool1-monitor-interval-5s interval=5s timeout=1s on-fail=restart start: target-pool1-start-interval-0s interval=0s timeout=10s stop: target-pool1-stop-interval-0s interval=0s timeout=10s Resource: lun1-pool1 (class=ocf provider=heartbeat type=iSCSILogicalUnit) Attributes: lun1-pool1-instance_attributes implementation=lio-t lun=0 path=/dev/pool0/vol1 target_iqn=iqn.2004-10.com.ubuntu:01:84de25ddfc37 Operations: monitor: lun1-pool1-monitor-interval-10s interval=10s timeout=10s start: lun1-pool1-start-interval-0s interval=0s timeout=10s stop: lun1-pool1-stop-interval-0s interval=0s timeout=10s

Moving resource by pcs resource move resource_id nodename is OK, but if I try to hard stop node (reset VM in VMware), in log view this:

Mar 30 15:23:17.738 vnas-centos-1 pacemaker-execd [1242] (log_op_output) info: zfs-pool1_start_0[2266] error output [ cannot open 'pool0': no such pool ] Mar 30 15:23:17.739 vnas-centos-1 pacemaker-execd [1242] (log_op_output) info: zfs-pool1_start_0[2266] error output [ /usr/lib/ocf/resource.d/heartbeat/ZFS: line 35: [: : integer expression expected ] Mar 30 15:23:17.739 vnas-centos-1 pacemaker-execd [1242] (log_op_output) info: zfs-pool1_start_0[2266] error output [ cannot import 'pool0': one or more devices is currently unavailable ] Mar 30 15:23:17.739 vnas-centos-1 pacemaker-execd [1242] (log_op_output) info: zfs-pool1_start_0[2266] error output [ /usr/lib/ocf/resource.d/heartbeat/ZFS: line 35: [: : integer expression expected ] Mar 30 15:23:17.739 vnas-centos-1 pacemaker-execd [1242] (log_finished) info: zfs-pool1 start (call 26, PID 2266) exited with status 1 (execution time 237ms) Mar 30 15:23:17.739 vnas-centos-1 pacemaker-controld [1245] (log_executor_event) notice: Result of start operation for zfs-pool1 on vnas-centos-1: error | CIB update 23, graph action confirmed; call=26 key=zfs-pool1_start_0 rc=1 Mar 30 15:23:17.739 vnas-centos-1 pacemaker-controld [1245] (log_executor_event) notice: zfs-pool1_start_0@vnas-centos-1 output [ cannot open 'pool0': no such pool\n/usr/lib/ocf/resource.d/heartbeat/ZFS: line 35: [: : integer expression expected\ncannot import 'pool0': one or more devices is currently unavailable\n/usr/lib/ocf/resource.d/heartbeat/ZFS: line 35: [: : integer expression expected\n ]

san4ez1008 commented 1 year ago

Each VM connect to shared DAS via HBA LSI. Man: https://netbergtw.com/top-support/articles/zfs-cib/

san4ez1008 commented 1 year ago

anybody help?