F5Networks / f5-cloud-failover-extension

F5 Cloud Failover Extension
Apache License 2.0
15 stars 2 forks source link

CFE unable to reach back-end API endpoint in Azure #63

Closed csiggydev closed 2 years ago

csiggydev commented 3 years ago

Do you already have an issue opened with F5 support?

Not yet, but will do so.

Description

Running a standard Active/Standby pair in Azure, once failover from primary (F5-1) to secondary (F5-2) occurs, F5-2 immediately initiates an HTTP POST /declare to the Azure metadata service back-end as indicated in the restnoded.log; however, after properly reading back the json template in the log, an error prompts as follows:

'Function error, retrying: getaddrinfo ENOTFOUND management.azure.com management.azure.com:443 Retries left: (x)'

Essentially, when the F5s failover, CFE should signal the Azure backend to change the next-hop in a particular Azure route-table to whatever the self-ips are specified in the Azure tag and referenced in the CFE json template.

Environment information

For bugs, enter the following information:

Severity Level

For bugs, enter the bug severity level. Do not set any labels.

Severity: 3

Thanks.

shyawnkarim commented 3 years ago

@csiggydev are you still facing this issue?

csiggydev commented 3 years ago

@shyawnkarim - yes, yes I am unfortunately. Followed the documentation to the letter and F5 Support doesn't have the knowledge to troubleshoot this issue. Failover appears to work intermittently at best, but usually not at all.

KrithikaChidambaram commented 3 years ago

@csiggydev Can you provide additional information: 1) QKviews 2) Repro steps with CFE declaration

csiggydev commented 3 years ago

@f5-chidambaram - i'm a bit hesitant to post the qkview here due to security concerns. Let me know if there's a secure place to submit that. As for the CFE declaration, the following template was posted to each F5 using Postman:

{
    "class": "Cloud_Failover",
    "environment": "azure",
    "externalStorage": {
        "scopingTags": {
            "f5_cloud_failover_label": "centralus-f5-api-failover"
        }
    },
    "failoverAddresses": {
        "enabled": false,
        "scopingTags": {
            "f5_cloud_failover_label": "centralus-f5-api-failover"
        }
    },
    "failoverRoutes": {
        "enabled": true,
        "scopingTags": {
            "f5_cloud_failover_label": "centralus-f5-api-failover"
        },
        "scopingAddressRanges": [
            {
                "range": "0.0.0.0/0"
            }
        ],
        "defaultNextHopAddresses": {
            "discoveryType": "static",
            "items": [
                "10.225.15.250",
                "10.225.15.251"
            ]
        }
    },
    "controls": {
        "class": "Controls",
        "logLevel": "silly"
    }
}
KrithikaChidambaram commented 3 years ago

@csiggydev Can you please confirm the following:

1) In your declaration, defaultNextHopAddress should list SELF_IPs (i.e. internal self-ip 10.225.15.250 and 10.225.15.251 should be self-ips) 2) In Azure, you should have Route table correctly tagged and Route table should include internal subnet

Looking at error it seems that there is an issue with accessing Azure metadata service: 'Function error, retrying: getaddrinfo ENOTFOUND management.azure.com management.azure.com:443 Retries left: (x)'

It can be either a DNS issue or a permission issue

Also, you can send the qkview to my email: p.chidambaram@f5.com

csiggydev commented 3 years ago

Hi @f5-chidambaram,

  1. yes, those next-hop addresses 10.225.15.250 and 10.225.15.251 are the self-ip addresses of the F5s themselves, specifically on the 'transit' vlan.
  2. The route-table 'ECPR-Transit-rt' has two relevant Tags for CFE as follows:
    • 'f5_cloud_failover_label: centralus-f5-api-failover' and 'f5_self_ips: 10.225.15.250,10.225.15.251'

I've tried variations of hard-coding the self-IPs in the declaration file, and also leveraging 'routeTag', and as stated previously, it works intermittently at best. Both F5s can resolve the FQDN management.azure.com and each F5 has the total list of Microsoft/Azure prefixes under each management route table, respectively. Furthermore, each f5 virtual machine has the appropriate permissions to a mutual storage account and route-table (Contributor access).

The QKViews will be sent to you soon. Thank you.

KrithikaChidambaram commented 3 years ago

Hi @csiggydev: I'm afraid I did not receive the QKViews. Do you still face the issue? Will you be able to send the QKViews to p.chidambaram@f5.com / s.karim@f5.com if the issue persists?

shyawnkarim commented 2 years ago

Closing. Please reopen if this is still an issue for you.