F5Networks / f5-cloud-failover-extension

F5 Cloud Failover Extension
Apache License 2.0
15 stars 2 forks source link

CFE failover incorrectly counts IPv6 addresses per interface in AWS ipv6 failover #79

Closed mikeoleary closed 2 years ago

mikeoleary commented 3 years ago

Do you already have an issue opened with F5 support?

No.

Description

CFE failover in AWS with IPv6 addresses. CFE appears to "count" each IPv6 address multiple times, thereby incorrectly reporting

Function error, retrying: Address count 17 per interface exceeds the limit for m5.xlarge. Retries left: 0

In this case I have only 1x IPv4 addr, and 4x IPv6 addresses, but CFE appears to count this as 17 addresses to move. This can be seen all through the logs and from what I can tell it is not dependent on the number of VS's a virtual address has. Ie., if I have 2 VIPs with different ports and same IP, the address looks like it's still counted 4 or 5 times. If I have 1 VIP, it's still counted 4 or 5 times.

Very simple replication of issue

  1. Deploy our supported 2-NIC CFT, set up for IPv6 failover by editing IAM role and Security Group accordingly
  2. See CFT issue 153 to make sure you can fail over IPv6 addresses
  3. Successfully fail over a VS with IPv6. You will need a VS with IPv4 also, and you can fail over 2x VS with IPv6.
  4. Watch CFE logs and notice that for each IPv6 address, it is counted multiple times. Example:
    Wed, 30 Jun 2021 13:53:29 GMT - fine: [f5-cloud-failover] associating ipv6 addresses: {"NetworkInterfaceId":"eni-033bf289856441e7c","Ipv6Addresses":["2600:1f18:43ea:1101:cb86:6603:447f:f069","2600:1f18:43ea:1101:c89:614f:cd7:65c6","2600:1f18:43ea:1101:245f:ff1e:e7dc:20d0","2600:1f18:43ea:1101:cb86:6603:447f:f069","2600:1f18:43ea:1101:c89:614f:cd7:65c6","2600:1f18:43ea:1101:548b:857:8926:30eb","2600:1f18:43ea:1101:c89:614f:cd7:65c6","2600:1f18:43ea:1101:548b:857:8926:30eb","2600:1f18:43ea:1101:245f:ff1e:e7dc:20d0","2600:1f18:43ea:1101:cb86:6603:447f:f069","2600:1f18:43ea:1101:548b:857:8926:30eb","2600:1f18:43ea:1101:245f:ff1e:e7dc:20d0","2600:1f18:43ea:1101:cb86:6603:447f:f069","2600:1f18:43ea:1101:c89:614f:cd7:65c6","2600:1f18:43ea:1101:548b:857:8926:30eb","2600:1f18:43ea:1101:245f:ff1e:e7dc:20d0"]}
    Wed, 30 Jun 2021 13:53:29 GMT - finest: [f5-cloud-failover] Function error, retrying: Address count 17 per interface exceeds the limit for m5.xlarge Retries left: 0
  5. Now, create 4 total IPv6 addresses and 1 IPv4 address, and make associated VIPs. Attempt failover again, and you should see that CFE incorrectly believes the number of addresses to moves exceeds AWS's limits.

Environment information

For bugs, enter the following information:

Severity Level

For bugs, enter the bug severity level. Do not set any labels.

Severity: 2

I have created this as Sev 2 because it's blocking a prod migration for this customer, who has already reported multiple issues with IPv6 and CFE in AWS. However this one looks like a show stopper for their migration.

mikeoleary commented 3 years ago

It appears the incorrect address count may the square of the true IPv6 addr count, plus the IPv4 addr count.

During testing with AWS m5.xlarge, it appears that logs are showing that CFE is counting each IPv6 address: -5 times when 5 Virtual Addresses are configured -4 times when 4 Virtual Addresses are configured -3 times when 3, etc

So in my test, CFE function error/failure occurs after 4x IPv6 VIPs are configured (+1 IPv4 = 17 addresses, as shown in logs above). However in customer test they could deploy extremely large VM to get a few more VIPs (not an acceptable workaround, and they need more VIPs still).

shyawnkarim commented 3 years ago

This issue is being tracked internally with ID AUTOSDK-555.

mikeoleary commented 3 years ago

@shyawnkarim - unfortunately we need this issue re-opened and addressed. This was supposed to be fixed in CFE v1.9.0 and I even successfully tested a pre-release that was provided to me by a developer. However, I did not download and test the official release, once 1.9 was released, and now the customer is hitting this same issue with 1.9.0

Please prioritize if possible as this customer has been waiting over 6 months for this bug fix that is delaying production implementation.

mikeoleary commented 3 years ago

Thanks Shyawn for updating me. For anyone following the issue, this bug was fixed shortly after CFE v1.9.0 was released, so the fix is planned to be released with the next release of CFE and we don't have a ETA for that release at this moment.

shyawnkarim commented 2 years ago

Closing.

This issue was addressed with Release 1.10.