F5Networks / f5-aws-cloudformation

CloudFormation Templates for quickly deploying BIG-IP services in Amazon Web Services EC2
112 stars 118 forks source link

secondary private IP does not failover on same-net HA deployment #76

Closed JeffGiroux closed 4 years ago

JeffGiroux commented 5 years ago

Do you already have an issue opened with F5 support?

No

Description

Testing different templates for HA. I did same-net and chose to do public IP. The same-net deployment does NOT run the HA iApp, so how does an IP or secondary IP move when I hit failover? Currently, when I hit failover the public EIP doesn't move.

Is there missing steps in the CFT template docs? Are we supposed to run HA iApp?

If I choose no public IP then I have no public IPs. OK...my VIP will be private only and live on a secondary IP associated with that ENI. Same situation, if I failover after the same-net template has successfully deployed, then what exactly is "moving" over to the other instance? I don't see anything triggering a move of secondary IPs when I hit failover.

Template

https://github.com/F5Networks/f5-aws-cloudformation/tree/master/supported/failover/same-net/via-api/3nic/existing-stack/payg

Severity Level

3

shyawnkarim commented 5 years ago

Hi Jeff,

When deploying HA into the "Same availability zone", HA Across AZ iApp is not required for failover. BIG-IPs use the built-in failvoer script (/usr/libexec/aws/aws-failover-tgactive.sh). This script re-maps AWS secondary IPs (associated with floating IPs (VIPs) in traffic groups). For example

AWS EIP: 1.1.1.1 -> 10.1.1.100 (AWS Secondary IP / BIG-IP VIP) [BIG-IP 1 NIC] to: AWS EIP: 1.1.1.1 -> 10.1.1.100 (AWS Secondary IP / BIG-IP VIP) [BIG-IP 2 NIC]

1.1.1.1 = Bigip1VipEipAddress resource in template 10.1.1.100 = an AWS Secondary IP that maps to the floating BIG-IP VIP

For instance, you will see this in /var/log/ltm:

Aug 28 21:10:28 ip-10-0-2-228 notice tmm1[16125]: 01340011:5: HA unit 1 state change: from 0 to 1. Aug 28 21:10:29 ip-10-0-2-228.ec2.internal notice logger[28846]: /usr/libexec/aws/aws-failover-tgactive.sh (traffic-group-1): Started. Aug 28 21:10:32 ip-10-0-2-228.ec2.internal info logger[28899]: /usr/libexec/aws/aws-failover-tgactive.sh (traffic-group-1): endpoint URL: https://ec2.us-east-1.amazonaws.com. Aug 28 21:10:37 ip-10-0-2-228.ec2.internal notice logger[28998]: /usr/libexec/aws/aws-failover-tgactive.sh (traffic-group-1): Completed

See https://clouddocs.f5.com/cloud/public/v1/aws/AWS_ha.html for more details.

Thank you for filing. We've created Jira Item (#1599) to add some clarification to the documentation.

JeffGiroux commented 5 years ago

I read that link already and tried to follow it myself and also help a customer in PoC. Neither of us had the secondary IP float over during a failover. I'll review the clouddocs link again, spin up another CFT deployment and check logs.

Thx, Jeff

JeffGiroux commented 5 years ago

I was able to test this and it worked. I had to make sure a VIP (listener) was created with the same IP as the secondary IP. Without that association in the config, the failover does not move AWS IPs to the other ENI. Once I created a VIP and had the IP in the traffic group, the failover script ran and properly moved the IPs to the other F5 instance's ENI.

JeffGiroux commented 5 years ago

I tested further. This does NOT work when you deploy F5 instances manually through marketplace. The secondary private IPs only move when you have instances that were deployed via CFT. The clouddocs link should be updated to make note. It's possible that the marketplace images do not contain the necessary cloud scripts that are found in the CFT images. Hence the failover script on manually created F5s using marketplace does not move the secondary private IP.

It sits on the last log line showing endpoint but never showing a "completed" log line. Then shows instance sanity check error. When I look in AWS console, the secondary private IP does not move. The F5 instances do indeed change active/standby state. On the flip side, if I deploy F5 instances using CFT (samenet-az failover) then the secondary private IPs move fine.

Sep 5 21:30:53 ip-10-0-0-140.us-west-2.compute.internal notice sod[4625]: 010c006d:5: Leaving Standby for Active: Next Active, peers agree on config. Sep 5 21:30:53 ip-10-0-0-140.us-west-2.compute.internal notice sod[4625]: 010c0053:5: Active for traffic group traffic-group-1. Sep 5 21:30:53 ip-10-0-0-140.us-west-2.compute.internal notice sod[4625]: 010c0019:5: Active Sep 5 21:30:53 ip-10-0-0-140 notice tmm1[15856]: 01340011:5: HA unit 1 state change: from 0 to 1. Sep 5 21:30:53 ip-10-0-0-140 notice -c [15856]: 01340011:5: HA unit 1 state change: from 0 to 1. Sep 5 21:30:54 ip-10-0-0-140.us-west-2.compute.internal notice logger[20848]: /usr/libexec/aws/aws-failover-tgactive.sh (traffic-group-1): Started. Sep 5 21:31:00 ip-10-0-0-140.us-west-2.compute.internal info logger[20909]: /usr/libexec/aws/aws-failover-tgactive.sh (traffic-group-1): endpoint URL: https://ec2.us-west-2.amazonaws.com. Sep 5 21:34:56 ip-10-0-0-140.us-west-2.compute.internal err logger[21302]: /usr/libexec/aws/aws-failover-tgactive.sh (traffic-group-1): /shared/vadc/aws/iid-document is invalid. Re-download it from http://169.254.169.254/latest/dynamic/instance-identity/document Sep 5 21:34:56 ip-10-0-0-140.us-west-2.compute.internal err logger[21303]: /usr/libexec/aws/aws-failover-tgactive.sh (traffic-group-1): Instance sanity check failed with error: Sep 5 21:34:56 ip-10-0-0-140.us-west-2.compute.internal err logger[21304]: /usr/libexec/aws/aws-failover-tgactive.sh (traffic-group-1):

shyawnkarim commented 4 years ago

Closing since HA templates now utilize the Cloud Failover Extension.