F5Networks / f5-cloud-failover-extension

F5 Cloud Failover Extension
Apache License 2.0
15 stars 2 forks source link

Azure VE failed to failover after upgrade to 1.2.1 #146

Closed washing20202020 closed 1 month ago

washing20202020 commented 2 months ago

Do you already have an issue opened with F5 support?

Yes

Description

After upgrade to V17 from V15, customer also upgrade the cft rpm to 1.2.1, then find the failover failed . finding that it stuck with below error, note the fqdn is tailor with subscriptions [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 60

Have verified the dns can resolve management.chinacloudapi.cn, but cannot resolve management.chinacloudapi.cnsubscriptions

Customer tested in lab, it works until 1.15.0, all 2.* does not work

similar issue reported https://github.com/Azure/azure-cli/issues/26866

so is this issue related with the new rpm ?

Fri, 30 Aug 2024 17:00:59 GMT - finest: socket 484 opened Fri, 30 Aug 2024 17:00:59 GMT - fine: [f5-cloud-failover] HTTP Request - POST /trigger Fri, 30 Aug 2024 17:00:59 GMT - fine: [f5-cloud-failover] Performing failover - initialization Fri, 30 Aug 2024 17:00:59 GMT - finest: [f5-cloud-failover] Device initialization complete Fri, 30 Aug 2024 17:01:00 GMT - finest: [f5-cloud-failover] Fetched proxy settings: Fri, 30 Aug 2024 17:01:00 GMT - finest: [f5-cloud-failover] {"protocol":"http","host":"","port":"8080","username":"","password":"****"} Fri, 30 Aug 2024 17:01:00 GMT - fine: [f5-cloud-failover] config: Fri, 30 Aug 2024 17:01:00 GMT - fine: [f5-cloud-failover] {"class":"Cloud_Failover","schemaVersion":"2.1.2","environment":"azure","externalStorage":{"scopingTags":{"f5_cloud_failover_label":"failover"}},"failoverAddresses":{"enabled":true,"scopingTags":{"f5_cloud_failover_nic_map":"external"},"requireScopingTags":false},"controls":{"class":"Controls","logLevel":"silly"}} Fri, 30 Aug 2024 17:01:00 GMT - finest: [f5-cloud-failover] proxySettings: Fri, 30 Aug 2024 17:01:00 GMT - finest: [f5-cloud-failover] {"protocol":"http","host":"","port":"8080","username":"","password":"****"} Fri, 30 Aug 2024 17:01:00 GMT - finest: [f5-cloud-failover] Device initialization complete Fri, 30 Aug 2024 17:01:00 GMT - finest: [f5-cloud-failover] Fetched proxy settings: Fri, 30 Aug 2024 17:01:00 GMT - finest: [f5-cloud-failover] {"protocol":"http","host":"","port":"8080","username":"","password":"****"} Fri, 30 Aug 2024 17:01:00 GMT - finest: [f5-cloud-failover] Subscriptions: {"0":"93f5fd8e-993b-4cc0-9b68-43434897987f"} Fri, 30 Aug 2024 17:01:00 GMT - finest: [f5-cloud-failover] Listing Storage Accounts Fri, 30 Aug 2024 17:01:00 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 60 Fri, 30 Aug 2024 17:01:01 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 59 Fri, 30 Aug 2024 17:01:02 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 58 Fri, 30 Aug 2024 17:01:03 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 57 Fri, 30 Aug 2024 17:01:04 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 56 Fri, 30 Aug 2024 17:01:05 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 55 Fri, 30 Aug 2024 17:01:06 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 54 Fri, 30 Aug 2024 17:01:07 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 53 Fri, 30 Aug 2024 17:01:08 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 52 ...................... Fri, 30 Aug 2024 17:01:53 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 8 Fri, 30 Aug 2024 17:01:54 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 7 Fri, 30 Aug 2024 17:01:55 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 6 Fri, 30 Aug 2024 17:01:56 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 5 Fri, 30 Aug 2024 17:01:57 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 4 Fri, 30 Aug 2024 17:01:58 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 3 Fri, 30 Aug 2024 17:01:59 GMT - finest: socket 484 closed Fri, 30 Aug 2024 17:01:59 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 2 Fri, 30 Aug 2024 17:02:00 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 1 Fri, 30 Aug 2024 17:02:01 GMT - finest: [f5-cloud-failover] Status: "getaddrinfo ENOTFOUND management.chinacloudapi.cnsubscriptions management.chinacloudapi.cnsubscriptions:443" Retries left: 0

Environment information

For bugs, enter the following information:

Severity Level

Severity: 2

Severity level definitions:

  1. Severity 1 (Critical) : Defect is causing systems to be offline and/or nonfunctional. immediate attention is required.
  2. Severity 2 (High) : Defect is causing major obstruction of system operations.
  3. Severity 3 (Medium) : Defect is causing intermittent errors in system operations.
  4. Severity 4 (Low) : Defect is causing infrequent interuptions in system operations.
  5. Severity 5 (Trival) : Defect is not causing any interuptions to system operations, but none-the-less is a bug.
mikeshimkus commented 2 months ago

Since you have a support case, I will track that. However, it does appear that Azure is returning an incorrect hostname when CFE queries it for endpoints; it seems to have been an issue with the Azure cli as well: https://github.com/Azure/azure-cli/issues/26866

Note the correct endpoints listed here: https://learn.microsoft.com/en-us/azure/china/resources-developer-guide#check-endpoints-in-azure

A possible workaround to bypass the incorrect endpoint lookup is to provide a customEnvironment in your CFE config using the endpoints for China cloud above. The default for CFE is Azure commercial and looks like this:

"customEnvironment": {
      "name": "AzureCustomEnviroment",
      "portalUrl": "https://portal.azure.com",
      "publishingProfileUrl": "http://go.microsoft.com/fwlink/?LinkId=254432",
      "managementEndpointUrl": "https://management.core.windows.net",
      "resourceManagerEndpointUrl": "https://management.azure.com/",
      "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
      "sqlServerHostnameSuffix": ".database.windows.net",
      "galleryEndpointUrl": "https://gallery.azure.com/",
      "activeDirectoryEndpointUrl": "https://login.microsoftonline.com/",
      "activeDirectoryResourceId": "https://management.core.windows.net/",
      "activeDirectoryGraphResourceId": "https://graph.windows.net/",
      "batchResourceId": "https://batch.core.windows.net/",
      "activeDirectoryGraphApiVersion": "2013-04-05",
      "storageEndpointSuffix": ".core.windows.net",
      "keyVaultDnsSuffix": ".vault.azure.net",
      "azureDataLakeStoreFileSystemEndpointSuffix": "azuredatalakestore.net",
      "azureDataLakeAnalyticsCatalogAndJobEndpointSuffix": "azuredatalakeanalytics.net"
}
washing20202020 commented 2 months ago

Thanks @mikeshimkus , have escalated this with a ene case. customer hope this can be fixed ,then they can upgrade to the latest release .

wdlid commented 2 months ago

Hi @mikeshimkus

After checking the code, the issue is consistant defination missed a '/' for AzureChina in file constants.js

   AZURE_ENVIRONMENTS: {
        Azure: {
            name: 'Azure',
            portalUrl: 'https://portal.azure.com',
            publishingProfileUrl: 'http://go.microsoft.com/fwlink/?LinkId=254432',
            managementEndpointUrl: 'https://management.core.windows.net',
            **resourceManagerEndpointUrl: 'https://management.azure.com/',**
            sqlManagementEndpointUrl: 'https://management.core.windows.net:8443/',
            sqlServerHostnameSuffix: '.database.windows.net',
            galleryEndpointUrl: 'https://gallery.azure.com/',
            activeDirectoryEndpointUrl: 'https://login.microsoftonline.com/',
            activeDirectoryResourceId: 'https://management.core.windows.net/',
            activeDirectoryGraphResourceId: 'https://graph.windows.net/',
            batchResourceId: 'https://batch.core.windows.net/',
            activeDirectoryGraphApiVersion: '2013-04-05',
            storageEndpointSuffix: '.core.windows.net',
            keyVaultDnsSuffix: '.vault.azure.net'
        },
        AzureChina: {
            name: 'AzureChina',
            portalUrl: 'https://portal.azure.cn',
            publishingProfileUrl: 'http://go.microsoft.com/fwlink/?LinkID=301774',
            managementEndpointUrl: 'https://management.core.chinacloudapi.cn',
            resourceManagerEndpointUrl: 'https://management.chinacloudapi.cn',  // miss a '/' here
            sqlManagementEndpointUrl: 'https://management.core.chinacloudapi.cn:8443/',
            sqlServerHostnameSuffix: '.database.chinacloudapi.cn',
            galleryEndpointUrl: 'https://gallery.chinacloudapi.cn/',
            activeDirectoryEndpointUrl: 'https://login.chinacloudapi.cn/',
            activeDirectoryResourceId: 'https://management.core.chinacloudapi.cn/',
            activeDirectoryGraphResourceId: 'https://graph.chinacloudapi.cn/',
            batchResourceId: 'https://batch.chinacloudapi.cn/',
            activeDirectoryGraphApiVersion: '2013-04-05',
            storageEndpointSuffix: '.core.chinacloudapi.cn',
            keyVaultDnsSuffix: '.vault.azure.cn'
        },
}

which caused

${this.environment.resourceManagerEndpointUrl}subscriptions/

return as

https://management.chinacloudapi.cnsubscriptions

Can you please kindly have a check?

pgouband commented 2 months ago

Hi,

Thanks for reporting. Added to the backlog and internal tracking ID for this request is: EC-549.

mikeshimkus commented 2 months ago

@washing20202020 @wdlid @pgouband See https://github.com/F5Networks/f5-cloud-failover-extension/releases/tag/v2.1.3