ClusterLabs / pcs

Pacemaker command line interface and GUI
GNU General Public License v2.0
250 stars 114 forks source link

pcs resource start <resource> and nothing happens #18

Closed chjohnst closed 11 years ago

chjohnst commented 11 years ago

So most of the startup within the cluster is an async operation, but I have seen many times where I attempt to start up a pcs resource and I get zero information in the logs. If I run pcs resource it will still say stopped, multiple start attempts and nothing happens.

I have attemped a few things without very little change. What can I do in a situation like this? Restarting the entire cluster so far has been my solution which is not ideal. What troubleshooting can I do here (still learning the ropes with the new command line suite which is so far super easy in comparison to crm!)

pcs resource failcount reset pcs resource cleanup pcs resource update

feist commented 11 years ago

What system are you on and what version of pacemaker, corosync & pcs are you using?

You can use the 'pcs cluster report' command which will create a tarball of all the information on your system that can be used to debug your problem.

There should be some information in the logs from pacemaker as to why the cluster isn't starting. Do you have stonith/fence devices configured?

chjohnst commented 11 years ago

After some debugging I think I found the issue. The systemd resource does not detect if there is a failure if you explicitly specify "op monitor=30s" for the resource, so the resource was always marked as stop. Once I added it it detected that the resource was not in a good state and it recovered. Is there not a default for this resource? Sounds like a bug in that it should use a default value.

feist commented 11 years ago

You're correct. I've added a default interval of 60s, and eventually I'm hoping to ask each resource agent what it's default should be.

Thanks for the feedback!