ClusterLabs / resource-agents

Combined repository of OCF agents from the RHCS and Linux-HA projects
GNU General Public License v2.0
493 stars 581 forks source link

storage_mon: Fix bug in handling of child process exit. #1793

Closed MasaoFujii closed 2 years ago

MasaoFujii commented 2 years ago

When storage_mon detects that a child process exits with zero, it resets the test_forks[] entry for the child process to 0, to avoid waitpid() for the process again in the loop. But, previously, storage_mon didn't do that when it detected that a child process exited with non-zero. Which caused waitpid() to be called again for the process already gone and to report an error like "waitpid on XXX failed: No child processes" unexpectedly. In this case, basically storage_mon should wait until all the child processes exit and return the final score, instead.

This patch fixes this issue by making storage_mon reset test_works[] entry even when a child process exits with non-zero.

knet-ci-bot commented 2 years ago

Can one of the admins verify this patch?

MasaoFujii commented 2 years ago

You can see how the patch changes the output of storage_mon when a child process exits with non-zero, as follows.

Without the patch:

$ storage_mon -d /dev/sda1 -s 1 -d /dev/sda2 -s 2 -v
Testing device /dev/sda2
Failed to open /dev/sda2: Permission denied
Testing device /dev/sda1
Failed to open /dev/sda1: Permission denied
waitpid on /dev/sda1 failed: No child processes

$ echo $?
255

With the patch:

$ storage_mon -d /dev/sda1 -s 1 -d /dev/sda2 -s 2 -v
Testing device /dev/sda2
Failed to open /dev/sda2: Permission denied
Testing device /dev/sda1
Failed to open /dev/sda1: Permission denied
Final score is 3

$ echo $?
3
oalbrigt commented 2 years ago

ok to test