[Question] About changing fence_scsi.

HideoYamauchi commented 7 months ago

Hi All,

With the following changes, preempt() is now executed in register_dev(). Looking at the comments, it says that the registration is cleared by preempt(), so it is registered again.

(snip)
def register_dev(options, dev, key):
(snip)
    if registration_key_exists:
        # If key matches, make sure it matches with the connection that
        # exists right now. To do this, we can issue a preempt with same key
        # which should replace the old invalid entries from the target.
        if not preempt(options, options["--key"], dev):
            return False

        # If there was no reservation, we need to issue another registration
        # since the previous preempt would clear registration made above.
        if get_reservation_key(options, dev, False) != options["--key"]:
            return register_helper(options, options["--key"], dev)
(snip)
def scsi_check(hardreboot=False):
    if len(sys.argv) >= 3 and sys.argv[1] == "repair":
        return int(sys.argv[2])
(snip)
    for dev in devs:
        for n in range(int(options["retry"]) + 1):
            if n > 0:
                logging.debug("retry: " + str(n) + " of " + options["retry"])
            if key in get_registration_keys(options, dev, fail=False):
                logging.debug("key " + key + " registered with device " + dev)
                return 0
            else:
                logging.debug("key " + key + " not registered with device " + dev)

            if n < int(options["retry"]):
                time.sleep(float(options["retry-sleep"]))

    logging.debug("key " + key + " registered with any devices")

    if hardreboot == True:
        libc = ctypes.cdll['libc.so.6']
        libc.reboot(0x1234567)
    return 2
(snip)

However, when used in conjunction with the watchdog service, if the key is cleared at the time of preempt(), won't there be a time when the key cannot be recognized depending on when the watchdog service's scsi_check() is executed?

In this case, I think it is necessary to set the retry parameter and retry-sleep parameter in /etc/sysconfig/stonith, what do you think?

Best Regards, Hideo Yamauchi.

oalbrigt commented 7 months ago

You're right. It might be a good idea to set retry to 2 or 3 in this case. retry-sleep defaults to 1s, so that only needs to be updated if you want it to sleep longer between tries.

HideoYamauchi commented 7 months ago

Hi Oyvind,

Thanks for your comment.

You're right. It might be a good idea to set retry to 2 or 3 in this case. retry-sleep defaults to 1s, so that only needs to be updated if you want it to sleep longer between tries.

Okay!

Looking at the PR comments, I think this fix and the next fix are related to when using an iSCSI initiator connection, but is my understanding correct?

https://github.com/ClusterLabs/fence-agents/commit/34baef58db442148b8e067509d2cdd37b7a91ef4

For example, is it correct to think that it does not affect the use of fence_scsi using RDM disks on vSphere?

Best Regards, Hideo Yamauchi.

oalbrigt commented 6 months ago

It should only trigger when this happens, which should be with iSCSI initiator connection only, but I havent tested it with RDM disks on vSphere.

HideoYamauchi commented 6 months ago

Hi Oyvind,

Thanks for your comment.

It should only trigger when this happens, which should be with iSCSI initiator connection only, but I havent tested it with RDM disks on vSphere.

I looked at RHEL's Bugzilla, etc., but I couldn't figure out how to test the original problem. If there is a way to test the original problem, I would like to check it with an RDM disk on vSphere.

Could you tell me how to test?

Best Regards, Hideo Yamauchi.

oalbrigt commented 6 months ago

@smohanan20 Can you give an example for how to test this?

smohanan20 commented 6 months ago

Establish iSCSI sessions with a storage device that supports multiple LUNs per iSCSI target. We would also need multiple iSCSI target connections. Each session would have its own ISID.
Use sg_persist to make registrations/reservations.
Add a new LUN to the first iSCSI target and establish a session with client. The iSCSI initiator picks a new session id to use for ISID.
Use sg_persist to register the new device
Reboot the initiator. This should cause the connections to re-establish and there is a good chance that the ISID changes (compared to the state pre-reboot) for iSCSI targets (due to initiator's order of iSCSI login requests). This makes the old reservation/registration invalid.

The fix helps in registering the device by preempting the existing ISID from the device.

HideoYamauchi commented 6 months ago

@smohanan20

thank you for your answer.

Looking at the steps in question, I think it only occurs in iSCSI configurations. I think this will probably be the same for the next fix.

https://github.com/ClusterLabs/fence-agents/commit/34baef58db442148b8e067509d2cdd37b7a91ef4

I think that RDM connections on vSphere that do not go through iSCSI initiator connections are probably not related to this problem.

I'll check it out a little more.

Perhaps it is correct to skip this modification process in case of RDM disks.

Many thanks, Hideo Yamauchi.

HideoYamauchi commented 6 months ago

Hi @smohanan20, @oalbrigt

When using an RDM disk in a vSphere environment, this problem does not seem to occur because the RDM disk is directly recognized by the OS and there is no iSCSI connection (ISID).

Even with an RDM disk in a vSphere environment, if only the key is registered in the device and is not reserved, and the preempt() process is executed, the registered key will be cleared. However, in the process flow of fence_scsi, I think that the case where only the key is registered on the device and no reservation is registered at all is when no reservation is registered after unfencing/fencing. In this case, fence_scsi will execute sys.exit(1), so unfencing/fencing will be re-executed by pacemaker. Therefore, I think that the process of re-registering the key cleared by preempt() below becomes effective when unfencing/fencing fails, is my understanding correct? Or is there another case where a reservation is not registered at all?

(snip)
def register_dev(options, dev, key):
    dev = os.path.realpath(dev)
(snip)
    # Check if any registration exists for the key already. We track this in
    # order to decide whether the existing registration needs to be cleared.
    # This is needed since the previous registration could be for a
    # different I_T nexus (different ISID).
    registration_key_exists = False
    if key in get_registration_keys(options, dev):
        logging.debug("Registration key exists for device " + dev)
        registration_key_exists = True
(snip)
    if registration_key_exists:
        # If key matches, make sure it matches with the connection that
        # exists right now. To do this, we can issue a preempt with same key
        # which should replace the old invalid entries from the target.
        if not preempt(options, key, dev, key):
            return False

        # If there was no reservation, we need to issue another registration
        # since the previous preempt would clear registration made above.
        if get_reservation_key(options, dev, False) != key:
            return register_helper(options, dev, key)
(snip)

When there is no reservation in the device, the key registration is removed by preempt(). However, by checking the reservation first and registering the reservation before preempt() when there is no reservation, there is no need to re-register the key, and I believe that the registered key will not be lost. Will there be a problem if fence_scsi register a reservation before preempt() when there is no reservation?

(snip)
    if registration_key_exists:

        # Add reserve_dev()
        if get_reservation_key(options, dev) is None \
            and not reserve_dev(options, dev) \
            and get_reservation_key(options, dev) is None:
               return False

        # If key matches, make sure it matches with the connection that
        # exists right now. To do this, we can issue a preempt with same key
        # which should replace the old invalid entries from the target.
        if not preempt(options, key, dev, key):
            return False
(snip)

Best Regards, Hideo Yamauchi.

smohanan20 commented 6 months ago

Hi @HideoYamauchi ,

I think that the case where only the key is registered on the device and no reservation is registered at all is when no reservation is registered after unfencing/fencing

Is this true for all cases? On startup, when devices are just registered to the device, there won't be any reservation tied to the key. When reservation happens, the reservation would be per key (client) right? For this case, there has been no fence/unfence operation that happened yet.

This fix solves a problem that I think is unique for iSCSI cases only. The general idea of the fix was that fencing agent shouldn't consider the presence of a key with a device as an indication that the client holds the key. For cases like iSCSI where the key is tied to I_T Nexus, a different I_T nexus can be formed but the fencing agent would assume that the key belongs to the client.

Regards

HideoYamauchi commented 6 months ago

Hi @smohanan20, @oalbrigt

Thanks for your comment.

I think that the case where only the key is registered on the device and no reservation is registered at all is when no reservation is registered after unfencing/fencing

Is this true for all cases? On startup, when devices are just registered to the device, there won't be any reservation tied to the key. When reservation happens, the reservation would be per key (client) right? For this case, there has been no fence/unfence operation that happened yet.

I don't understand this condition. Normally, when configuring fence_scsi with pacemaker, it would be as follows, so I think there is no case where only the reservation does not exist.

Step1) The OS already recognizes the iSCSI disk Step2) pacemaker executes unfencing(on). Step3) Unfencing registers the key and reservation on the iSCSI disk. If the reservation fails at this point, the key will just be registered.

Are there other cases?

This fix solves a problem that I think is unique for iSCSI cases only.

Okay. I think so too.

The general idea of the fix was that fencing agent shouldn't consider the presence of a key with a device as an indication that the client holds the key. For cases like iSCSI where the key is tied to I_T Nexus, a different I_T nexus can be formed but the fencing agent would assume that the key belongs to the client.

I understand that this means always re-registering the keys when unfencing/fencing. I have no objections.

What I'm concerned about is that the key registration may drop out momentarily. As a result, when combined with the watchdog service, a node that momentarily loses a non-defective key will restart, so you will need to set the retry parameter in /etc/sysconfig/stonith as answered by Oyvind.

To avoid this situation, I proposed the following changes first. With this change, the key has already been re-registered, so if there is no reservation, the reservation will be registered and the key will not be removed by preempt().

(snip)
def register_dev(options, dev, key):
    dev = os.path.realpath(dev)
(sinp)
    if registration_key_exists:

        # -----------------Add Sart reserve_dev()------------------------------
        if get_reservation_key(options, dev) is None \
            and not reserve_dev(options, dev) \
            and get_reservation_key(options, dev) is None:
               return False
        # -----------------Add End reserve_dev()------------------------------

        # If key matches, make sure it matches with the connection that
        # exists right now. To do this, we can issue a preempt with same key
        # which should replace the old invalid entries from the target.
        if not preempt(options, key, dev, key):
            return False
(snip)

What do you think about this change? Since the key registration will not be removed, there should be no need to re-register the key.

Best Regards, Hideo Yamauchi.

smohanan20 commented 6 months ago

@HideoYamauchi Thanks for the proposal.

What do you think about this change?

My assumption about a device having just registration with the device and no reservation is based on the SPC spec that allows this. If pacemaker guarantees to have both registration and reservation, then your fix to reserve looks good.

HideoYamauchi commented 6 months ago

Hi @smohanan20, @oalbrigt

Thanks for your comment.

What do you think about this change?

My assumption about a device having just registration with the device and no reservation is based on the SPC spec that allows this. If pacemaker guarantees to have both registration and reservation, then your fix to reserve looks good.

Okay!

I will review the details a little more and submit a revised PR.

From what I've confirmed, if you follow the same idea, it seems that similar considerations are required when using an iSCSI storage multipath configuration with fence_mpath. I will also consider modifying fence_mpath.

Many thanks, Hideo Yamauchi.

HideoYamauchi commented 5 months ago

Hi @smohanan20, @oalbrigt

After further investigation, I found that executing preempt() in the following cases will remove the key of the node that executed it.

Case 1: When there is no reservation Case 2: When it is reserved by another node

If the currently reserved key is for another node before executing preempt(), there is no need to replace the reservation, so if preempt() is not executed, wouldn't the state of case 2 be avoided?

def register_dev(options, dev, key):
        dev = os.path.realpath(dev)
(snip)
        if get_reservation_key(options, dev, False) == key:
                if not preempt(options, key, dev, key):
                    return False
(snip)

If this is effective, in case 2, there should be no need to re-register the key after preempt().

Or is it necessary to execute preempt() even if the reservation key is different?

Best Regards, Hideo Yamauchi.

smohanan20 commented 5 months ago

Hi @HideoYamauchi ,

Or is it necessary to execute preempt() even if the reservation key is different?

Yes the purpose of the preempt is to clear any existing key that belongs to a different ISID than the one the client currently holds. Otherwise the cluster services won't resume. So in summary,

There can be two possible scenarios for the client

1) When the client (with same or different ISID) holds a reservation and registration on the device

2) When the client (with same or different ISID) only holds a registration on the device with no reservation

If the client holds a reservation, just issuing a PREEMPT request with the same key is supposed to replace the existing entry (both registration and reservation). This takes care of the case where the reservation/registration was held with a different ISID.

If the client does not hold the reservation, issuing a PREEMPT request with the same key removes the registration of the client. So in addition to sending a PREEMPT request, the agent has to send another REGISTER command to register with the new ISID.

Hope this helps.

HideoYamauchi commented 5 months ago

Hi @smohanan20, @oalbrigt

Thank you for your comment.

Regarding the registration key, I understand that it can always be rewritten by register_helper() in register_dev(), even if the ISID has changed.

For this reason, I think it is sufficient to execute preempt() when rewriting the reservation key.

Also, if there is no reservation, I think it is sufficient to just register the reservation key. (Because the registration key is rewritten by register_helper, which is executed first.)

It is necessary to add a key parameter to reserve_dev(), but I think that the following change can avoid the behavior of the registration key being removed by preempt(). Is there any problem?

def register_dev(options, dev, key):
(snip)
    # Check if any registration exists for the key already. We track this in
    # order to decide whether the existing registration needs to be cleared.
    # This is needed since the previous registration could be for a
    # different I_T nexus (different ISID).
    registration_key_exists = False
    if key in get_registration_keys(options, dev):
        logging.debug("Registration key exists for device " + dev)
        registration_key_exists = True
    if not register_helper(options, dev, key):
        return False

    if registration_key_exists:
            # If there is no reservation, it will be registered.
            # This state does not exist in the operation of fence_scsci, 
            # but it does exist in the SPC specifications, so a reservation will be registered to avoid preempt(). 
            if get_reservation_key(options, dev) is None
                        logging.debug("Re-register the device reservation.(key=" + key + ", device=" + dev + ")\n")
                        if not reserve_dev(options, dev, key) and get_reservation_key(options, dev) is None:
                        logging.debug("Failed to create reservation (key=" + key + ", device=" + dev + ")\n")
                        return False
            else 
                        # If the reservation keys are the same, run preempt().
                        # If the reservation keys are different, preempt() will not be executed to prevent the registration key from being lost.
                        if get_reservation_key(options, dev, False) == key:
                            # If key matches, make sure it matches with the connection that
                            # exists right now. To do this, we can issue a preempt with same key
                            # which should replace the old invalid entries from the target.
                            if not preempt(options, key, dev, key):
                                return False
    return True

Best Regards, Hideo Yamauchi.

smohanan20 commented 5 months ago

Hi @HideoYamauchi

Thanks for the snippet.

        # If there is no reservation, it will be registered.
This state does not exist in the operation of fence_scsci,

but it does exist in the SPC specifications, so a reservation will be registered to avoid preempt().

if get_reservation_key(options, dev) is None logging.debug("Re-register the device reservation.(key=" + key + ", device=" + dev + ")\n") if not reserve_dev(options, dev, key) and get_reservation_key(options, dev) is None: logging.debug("Failed to create reservation (key=" + key + ", device=" + dev + ")\n") return False

This will work but there is one minor case that gets left out because of the usage of "reserve" vs "preempt". Let us assume that there is a connection with a different ISID (registration only - no reservation). The snippet at the beginning will register the proper ISID with the device. When the code executes this snippet, it finds that there is no reservation and issues a reserve() which will make a reservation with the proper ISID. However the registration with old ISID stays in the device even though that is not active (stale value). Preempt ensures that the old value (old ISID) is evicted away compared to a "reserve" which keeps the old value around. Subsequently if this problem happens again, we may have more registrations with stale ISIDs on the device.

I believe this won't break the existing workflows but may leave possibilities of entries around (which allows those old registered clients to issue IO to the device in theory)

Thanks

HideoYamauchi commented 4 months ago

Hi @smohanan20, @oalbrigt,

Thank you for your valuable comment.

I have also seen keys being registered multiple times before. At the time, I didn't understand why multiple keys were being registered.

I will look into it a bit more, but my idea doesn't seem correct considering that multiple keys can be registered.

I'll look into it again and comment again.

Many thanks, Hideo Yamauchi.

HideoYamauchi commented 4 months ago

Hi @smohanan20, @oalbrigt,

After much consideration, we have come to the conclusion that it is difficult to improve the current processing.

We have concluded that it is necessary to configure /etc/sysconfig/stonith as a way to deal with cases when key registration is lost.

We will also close this issue. Thank you very much for your cooperation.

Best Regards, Hideo Yamauchi.

ClusterLabs / fence-agents

[Question] About changing fence_scsi. #576

This state does not exist in the operation of fence_scsci,

but it does exist in the SPC specifications, so a reservation will be registered to avoid preempt().