Open jayan800 opened 1 year ago
Try running pcs resource debug-start --full <resource>
. That should show you all the commands it's running, and hopefully some pointers to what's wrong.
Thank you.
The debug command completed without any errors.
is there anything else to check?
You can run pcs resource update <resource> trace_ra=1
and then disable/enable or restart the resource.
The trace files will be available in /var/lib//heartbeat/trace_ra/
Thank you. I will enable the trace. fingers crossed
Hi All,
We are running a two node pacemaker cluster in AWS and we use "awsvip" resource type to configure the vip IP. Below is the conf
pcs resource show privip_node1
Resource: privip_node1 (class=ocf provider=heartbeat type=awsvip) Attributes: secondary_private_ip=10.x.x.x Operations: migrate_from interval=0s timeout=30s (privip_node1-migrate_from-interval-0s) migrate_to interval=0s timeout=30s (privip_node1-migrate_to-interval-0s) monitor interval=20s timeout=30s (privip_node1-monitor-interval-20s) start interval=0s timeout=30s (privip_node1-start-interval-0s) stop interval=0s timeout=30s (privip_node1-stop-interval-0s) validate interval=0s timeout=10s (privip_node1-validate-interval-0s)
pcs resource show node1_vip
Resource: node1_vip (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=10.x.x.x Operations: monitor interval=10s timeout=20s (node1_vip-monitor-interval-10s) start interval=0s timeout=20s (node1_vip-start-interval-0s) stop interval=0s timeout=20s (node1_vip-stop-interval-0s)
The EC2 instance is configured to use IMDSV2.The fence_aws agent and resource-agent have also been upgraded to the most recent versions, which support imdsv2. Additionally, the resource is set up to use the IAM Profile credentials.
fence-agents-aws-4.2.1-41.el7_9.3.x86_64 python-s3transfer-0.1.13-1.0.1.el7.noarch resource-agents-4.1.1-61.el7_9.15.x86_64
pip list | grep -i boto boto3 (1.10.0) botocore (1.13.50)
aws --version aws-cli/2.9.4 Python/3.9.11 Linux/3.10.0-1160.80.1.0.1.el7.x86_64 exe/x86_64.oracle.7 prompt/off
pip3 list | grep -i boto boto3 1.23.10 botocore 1.26.10
The privip resource consistently fails with the different errors:
pengine: warning: unpack_rsc_op_failure: Processing failed monitor of privip_node2 on node2: unknown error | rc=1 Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000 process (PID 109357) timed out Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000 process (PID 109357) timed out Apr 13 11:09:54 node2 lrmd[3773]: warning: privip_node2_monitor_20000:109357 - timed out after 30000ms
Jun 16 10:01:43 node2 lrmd[36967]: notice: privip_node2_monitor_20000:13042:stderr [ Unable to locate credentials. You can configure credentials by running "aws configure". ] Jun 16 10:01:43 node2 crmd[36970]: notice: privip_node2_monitor_20000:91 [ % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 359 100 359 0 0 37513 0 --:--:-- --:--:-- --:--:-- 39888\n\nUnable to locate credentials. You can configure credentials by running "aws configure".\n ]
Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ #015 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 169.254.169.254:80; Connection refused ] Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ #015 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (7) Failed connect to 169.254.169.254:80; Connection refused ] Jun 22 10:10:10 node1 lrmd[12465]: notice: privip_node1_monitor_20000:105561:stderr [ An error occurred (MissingParameter) when calling the DescribeInstances operation: The request must contain the parameter InstanceId ]
Failed Resource Actions:
Any advice would be great.