Closed smohanan20 closed 9 months ago
Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/fence-agents/job/fence-agents-pipeline/job/PR-558/1/input
Do you have a reproducer? We're having issues reproducing this in our env.
Steps that I’ve used to reproduce ISID issue
What do you use to create a 2nd session? We're unable to do it with targetcli.
The initial repro was with a storage vendor that exposes the target/lun. But I can reproduce with targetcli post reboots:
create iqn.2003-01.org.linux-iscsi.localhost.x8664:disk<0-1,5-6>
iscsiadm -m discover -t st -p <ip>
iscsiadm -m node -l
iscsiadm -m session -P 3 | grep 'SID\|Target:'
Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk0 (non-flash) SID: 27 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk1 (non-flash) SID: 28 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk5 (non-flash) SID: 29 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk6 (non-flash) SID: 30
3. Create another new iscsi target - disk2 and establish same session. disk 2 gets SID 31
iscsiadm -m discovery -t st -p
Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk0 (non-flash) SID: 27 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk1 (non-flash) SID: 28 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk5 (non-flash) SID: 29 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk6 (non-flash) SID: 30 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk2 (non-flash) SID: 31
5. I rebooted the client and listed down sessions
iscsiadm -m session -P 3 | grep 'SID|Target:'
Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk6 (non-flash) SID: 10 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk2 (non-flash) SID: 11 ... Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:sn.f703c1d91bd7 (non-flash) SID: 6 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk0 (non-flash) SID: 7 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk1 (non-flash) SID: 8 Target: iqn.2003-01.org.linux-iscsi.localhost.x8664:disk5 (non-flash) SID: 9
The SIDs were not consistent with the order pre-reboot. I can have registration with a SID before reboot and now post-reboot(or any disconnects for that matter) may get a new SID i.e., ISID would make the registration obsolete.
retest this please
@oalbrigt Did you have anything specific in mind for me to test?
No. That was for our CI to run it's tests.
Thanks.
Problem:
When a device is powered off (preempted), fence_scsi agent assumes that the client has a registration to the device and sends a preempt-and-abort request on the key held by the other device. This fails due to reservation conflict if the host registration has a conflicting ISID. (Another manifestation of problem https://github.com/ClusterLabs/fence-agents/pull/529)
Impact:
If the local host is unable to preempt any other hosts because a matching registration with local host is not found, then the local host won't be able to start the resources.
Proposed Fix:
To fix this, the agent needs to register with the host key before it tries a preempt request.