ClusterLabs / resource-agents

Combined repository of OCF agents from the RHCS and Linux-HA projects
GNU General Public License v2.0
493 stars 579 forks source link

podman: add info log when returning OCF_NOT_RUNNING #1829

Open freedge opened 1 year ago

freedge commented 1 year ago

there are multiple cases where OCF_NOT_RUNNING is returned and unfortunately, there are rare cases where the container is, in fact, running.

We add an info log so we have something to look at in case the heartbeat fails for seemingly no reason.

knet-ci-bot commented 1 year ago

Can one of the admins verify this patch?

freedge commented 1 year ago

that was the error I was looking for fyi

Dec 20 08:53:56  podman(rabbitmq-bundle-podman-1)[435653]:    INFO: monitor cmd failed (rc=255), output: Error: an exec session with ID 207b204101475831fa87c972f13dfff8872e4f33005444fb8a572f6ffc49077f already exists: exec session already exists
david-hill commented 1 year ago

Could we add a retry here instead ? If it's a race condition that we sometimes hit, re-trying should pass the second time and avoid a costly fencing event.

knet-jenkins[bot] commented 1 year ago

Can one of the admins check and authorise this run please: https://ci.kronosnet.org/job/resource-agents-pipeline/job/PR-1829/1/input