Closed Leo-Yan closed 4 years ago
According to the IDM State-Cmd matrix, this is the use case S13 / S15 which will return Command Terminated.
Hi @alphonsus-kwok,
Understood. Sorry I didn't carefully review the defined status mapping for this case.
From my understanding, this issue refers to the use case S15, the owner has locked the mutex, but other hosts have been timeout. For this case, the owner can promotion the locking mode from shareable to exclusive.
To fix this case, the lock manager can use an ugly workaround by firstly releasing the mutex and then acquiring again with exclusive mode. But this might introduce atomic issue. The best approach is to fix in the drive firmware: the owner can hold its ownership and promote from shareable mode to exclusive mode.
Perhaps Breaklock command would be better. For use case HostA - shared timeout HostB - shared locked HostC - shared timeout then breaklock command sent by HostB (exclusive) will promote to Exclusive class.
For use case: HostA - shared timeout HostB - shared locked HostC - shared locked then breaklock command sent by HostB (exclusive) will return Command Terminated.
Let's wait for Tom's input also.
Hi @alphonsus-kwok,
I have tried to use breaklock command to workaround this issue, but based on the testing result, the break lock command also returns "Command Terminated" error.
Sending a BreakLock Shared followed by a Refresh Exclusive should promote to the Rid to exclusive if all other shared locks have timed out. A BreakLock Exclusive should also do this in one command. See version 1.4b of the command response spreadsheet for confirmation.
Did a quick trying for the flow:
When breaklock shared, the SCSI command still returns failure with status "COMMAND_TERMINATED".
As tried in my previous replying for "A BreakLock Exclusive", also returns failure with status "COMMAND_TERMINATED".
Breaklock will return "COMMAND_TERMINATED" as there is still an active locked mutex as per the IDM State-Cmd matrix cell W15.
But I saw a bug that after Refresh, the mutex will be lost. I will resolve it and give you an update.
Tested with two cases:
One case is IDM lock manager test cases: https://github.com/Seagate/propeller/blob/master/test/ilm_test.py#L736 and another case is LVM multi hosts testing: https://github.com/Seagate/lvm2-stx-private/blob/centos7_lvm2/test/shell/idm_multi_hosts_lv_sh_timeout_hostb.sh; both can pass.
So can confirm this issue has been fixed by the latest firmware 1759.
Firmware version: signed_CFW_MFW_r2229.lod
Host A: Acquire a mutex with shareable mode and stop to renew it
Host B: Acquire the same mutex with shareable mode; Wait for 60 seconds for host A timeout;
Mutex state for Host A:
Mutex state for Host B:
If host B uses refresh command to convert from shareable mode to exclusive mode, the drive firmware will report the the error as:
2020-07-03 03:54:18 25150 [27788]: _scsi_read: status 0x22 2020-07-03 03:54:18 25150 [27788]: _scsi_read: masked status 0x11 2020-07-03 03:54:18 25150 [27788]: _scsi_read: host status 0x0 2020-07-03 03:54:18 25150 [27788]: _scsi_read: driver status 0x0
In this case, the host B should can convert the lock mode from shareable to exclusive successfully.