ClusterLabs / fence-agents

Fence agents
104 stars 160 forks source link

[Question] About modifying add disable-timeout parameter. #358

Closed HideoYamauchi closed 4 years ago

HideoYamauchi commented 4 years ago

Hi All,

The details are unknown, but the following PR has been included in master.

This fix currently fails with the reboot action.

[root@rh80-02 fence-agents]# fence_vmware_rest -o reboot  -a "192.168.28.25" -l "Administrator" -p "password" --ssl-insecure -n rh80-01 --disable-timeout=true
2020-10-15 10:25:50,855 ERROR: Failed: Timed out waiting to power OFF

Maybe other fence_agents will also fail.

This is because the async_set_multi_power_fn() process returns False with an off operation.

def async_set_multi_power_fn(connection, options, set_power_fn, get_power_fn, retry_attempts):
    plugs = options["--plugs"] if "--plugs" in options else [""]

    for _ in range(retry_attempts):
        for plug in plugs:
            try:
                options["--uuid"] = str(uuid.UUID(plug))
            except ValueError:
                pass
            except KeyError:
                pass

            options["--plug"] = plug
            set_power_fn(connection, options)
            time.sleep(int(options["--power-wait"]))

        for _ in range(int(options["--power-timeout"])):
            if get_multi_power_fn(connection, options, get_power_fn) != options["--action"]:
                time.sleep(1)
            else:
                return True
    return False

Therefore, the on operation at the time of reboot operation is not executed. As a result, the reboot operation also fails.

I think it is very dangerous not to check the off status during the off operation, including the reboot operation. Shouldn't the on state (power-timeout) be skipped only when the on operation is performed?

My point may be wrong, as future changes to the Pacemaker spec may be relevant.

Was this change due to consideration of future specifications when combined with Pacemaker?

Best Regards, Hideo Yamauchi.

oalbrigt commented 4 years ago

Oh. That's not good.

I'll make a patch to fix it.

The patch is intended to make it not timeout when run from Pacemaker, so Pacemakers timeout for the fence device will be used instead by default.

oalbrigt commented 4 years ago

I've made a patch to solve this issue. See PR linked above.

HideoYamauchi commented 4 years ago

Hi oalbrigt,

Thanks for your comment.

Okay! I will check your patch tomorrow.

Best Regards, Hideo Yamauchi.

HideoYamauchi commented 4 years ago

Hi oalbrigt,

I have confirmed that this fix(PR#359) works correctly. I also understand that this fix is not to skip the verification process, but to run it indefinitely and leave the timeout to Pacemaker.

Close this issue.

Many thanks, Hideo Yamauchi.