ClusterLabs / resource-agents

Combined repository of OCF agents from the RHCS and Linux-HA projects
GNU General Public License v2.0
493 stars 579 forks source link

Mid: storage-mon: daemonize storage_mon to deal with I/O hangs.(Add daemon mode) #1836

Closed HideoYamauchi closed 1 year ago

HideoYamauchi commented 1 year ago

Hi All,

This PR is a modified version of the following PR.

The processing of storage-mon is separated into original mode and daemon operation mode.

Best Regards, Hideo Yamauchi.

wenningerk commented 1 year ago

I like that storage_mon binary still can be used as before with the new code. I was thinking if it would make sense to keep storage_mon more independen from pacemaker like not having to know about attributes or attrd_updater as well as having to call it. Idea was to limit the daemon to repeated polling for disk-status and some kind of ipc to a resource-agent. The resource-agent would take care of the attributes and it would as well implement a good liveness-check for the daemon that is being triggered by pacemaker.

HideoYamauchi commented 1 year ago

Hi wenningerk,

Thanks for your comment.

I like that storage_mon binary still can be used as before with the new code. I was thinking if it would make sense to keep storage_mon more independen from pacemaker like not having to know about attributes or attrd_updater as well as having to call it. Idea was to limit the daemon to repeated polling for disk-status and some kind of ipc to a resource-agent. The resource-agent would take care of the attributes and it would as well implement a good liveness-check for the daemon that is being triggered by pacemaker.

For example, is it an image of hitting the binary of storage_mon with the monitor of RA of storage-mon, judging red/green by the response code, and updating the attribute on the RA side of storage-mon?

Best Regards, Hideo Yamauchi.

oalbrigt commented 1 year ago

Green/red refers to whether it's considered healthy (green) or not (red).

For more info see: https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/#tracking-node-health

wenningerk commented 1 year ago

@oalbrigt guess that wasn't the point - but I'm not sure if I'm getting Hideo right

@HideoYamauchi If I'm getting that right your suggestion is to implement a potential ipc client as part of the storage_mon binary. The binary would then be able to run in kind of 3 modes:

That would be more or less what I was thinking of. I like the approach to implement server & client in the same binary - have done so in the past with positive experience.

HideoYamauchi commented 1 year ago

Hi wenningerk,

What I had in mind, it's the same implementation. Let's think about this implementation a little more.

@oalbrigt I didn't write the detailed operation, but I already understand the red/green spec. Why is CI's default check an error?

Many thanks, Hideo Yamauchi.

wenningerk commented 1 year ago

CI thing looks like a more basic issue with the debian builder. Cool that we're obviously more or less on the same page.

oalbrigt commented 1 year ago

You can solve the CI issue by rebasing, as the WAS6 trap syntax issue was solved recently.

HideoYamauchi commented 1 year ago

Hi @oalbrigt, Hi wenningerk,

I understand that it is a CI issue. I think the next fix will take some time.

Many thanks, Hideo Yamauchi.

wenningerk commented 1 year ago

@HideoYamauchi np. Of course code is the best base for a discussion but if you want to discuss any details beforehand we can do that here as well.

Regards, Klaus

HideoYamauchi commented 1 year ago

Hi Klaus,

@HideoYamauchi np. Of course code is the best base for a discussion but if you want to discuss any details beforehand we can do that here as well.

All right!

Best Regards, HIdeo Yamauchi.