Closed csiggydev closed 2 years ago
@csiggydev are you still facing this issue?
@shyawnkarim - yes, yes I am unfortunately. Followed the documentation to the letter and F5 Support doesn't have the knowledge to troubleshoot this issue. Failover appears to work intermittently at best, but usually not at all.
@csiggydev Can you provide additional information: 1) QKviews 2) Repro steps with CFE declaration
@f5-chidambaram - i'm a bit hesitant to post the qkview here due to security concerns. Let me know if there's a secure place to submit that. As for the CFE declaration, the following template was posted to each F5 using Postman:
{
"class": "Cloud_Failover",
"environment": "azure",
"externalStorage": {
"scopingTags": {
"f5_cloud_failover_label": "centralus-f5-api-failover"
}
},
"failoverAddresses": {
"enabled": false,
"scopingTags": {
"f5_cloud_failover_label": "centralus-f5-api-failover"
}
},
"failoverRoutes": {
"enabled": true,
"scopingTags": {
"f5_cloud_failover_label": "centralus-f5-api-failover"
},
"scopingAddressRanges": [
{
"range": "0.0.0.0/0"
}
],
"defaultNextHopAddresses": {
"discoveryType": "static",
"items": [
"10.225.15.250",
"10.225.15.251"
]
}
},
"controls": {
"class": "Controls",
"logLevel": "silly"
}
}
@csiggydev Can you please confirm the following:
1) In your declaration, defaultNextHopAddress should list SELF_IPs (i.e. internal self-ip 10.225.15.250 and 10.225.15.251 should be self-ips) 2) In Azure, you should have Route table correctly tagged and Route table should include internal subnet
Looking at error it seems that there is an issue with accessing Azure metadata service: 'Function error, retrying: getaddrinfo ENOTFOUND management.azure.com management.azure.com:443 Retries left: (x)'
It can be either a DNS issue or a permission issue
Also, you can send the qkview to my email: p.chidambaram@f5.com
Hi @f5-chidambaram,
I've tried variations of hard-coding the self-IPs in the declaration file, and also leveraging 'routeTag', and as stated previously, it works intermittently at best. Both F5s can resolve the FQDN management.azure.com and each F5 has the total list of Microsoft/Azure prefixes under each management route table, respectively. Furthermore, each f5 virtual machine has the appropriate permissions to a mutual storage account and route-table (Contributor access).
The QKViews will be sent to you soon. Thank you.
Hi @csiggydev: I'm afraid I did not receive the QKViews. Do you still face the issue? Will you be able to send the QKViews to p.chidambaram@f5.com / s.karim@f5.com if the issue persists?
Closing. Please reopen if this is still an issue for you.
Do you already have an issue opened with F5 support?
Not yet, but will do so.
Description
Running a standard Active/Standby pair in Azure, once failover from primary (F5-1) to secondary (F5-2) occurs, F5-2 immediately initiates an HTTP POST /declare to the Azure metadata service back-end as indicated in the restnoded.log; however, after properly reading back the json template in the log, an error prompts as follows:
'Function error, retrying: getaddrinfo ENOTFOUND management.azure.com management.azure.com:443 Retries left: (x)'
Essentially, when the F5s failover, CFE should signal the Azure backend to change the next-hop in a particular Azure route-table to whatever the self-ips are specified in the Azure tag and referenced in the CFE json template.
Environment information
For bugs, enter the following information:
Severity Level
For bugs, enter the bug severity level. Do not set any labels.
Severity: 3
Thanks.