F5Networks / f5-azure-arm-templates-v2

Azure Resource Manager Templates for quickly deploying BIG-IP services in Azure
22 stars 45 forks source link

Failed to deploy full stack failover example template with BYOL #40

Open tashian opened 4 months ago

tashian commented 4 months ago

Describe the bug

I've been trying to deploy a BYOL image with the failover template as an HA pair, and I'm getting an error during onboarding: Error licensing: tryUntil: max tries reached: Unknown exception during ping ://:8080

Expected behavior

I expected the deployment to complete successfully.

Current behavior

The deployment fails with the following error:

VM has reported a failure when processing extension 'onboarder' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: 'Enable failed: failed to execute command: command terminated with exit status=1
[stdout]
T19:05:11.411Z [21044]: error: Task with error: {"code":500,"body":{"id":"6a8cda77-f47f-4eca-b1da-b239bb167626","selfLink":"https://localhost/mgmt/shared/declarative-onboarding/task/6a8cda77-f47f-4eca-b1da-b239bb167626","code":500,"status":"ERROR","message":"invalid config - rolled back","errors":["Error licensing: tryUntil: max tries reached: Unknown exception during ping ://:8080","Error licensing: tryUntil: max tries reached: Unknown exception during ping ://:8080"],"result":{"class":"Result","code":500,"status":"ERROR","message":"invalid config - rolled back","errors":["Error licensing: tryUntil: max tries reached: Unknown exception during ping ://:8080","Error licensing: tryUntil: max tries reached: Unknown exception during ping ://:8080"]},"declaration":{"schemaVersion":"1.0.0","class":"Device","async":true,"label":"Failover 2NIC BIG-IP declaration for Declarative Onboarding with BYOL license","Common":{"class":"Tenant","My_DbVariables":{"class":"DbVariables","dhclient.mgmt":"disable","config.allow.rfc3927":"enable","tm.tcpudptxchecksum":"Software-only"},"My_Provisioning":{"class":"Provision","ltm":"nominal"},"My_Ntp":{"class":"NTP","servers":["0.pool.ntp.org","1.pool.ntp.org"],"timezone":"UTC"},"My_Dns":{"class":"DNS","nameServers":["168.63.129.16"]},"My_System":{"autoPhonehome":true,"class":"System","hostname":"failover01.local","consoleInactivityTimeout":0,"cliInactivityTimeout":0,"autoCheck":true,"tmshAuditLog":true,"guiAuditLog":false,"mcpAuditLog":"enable","preserveOrigDhcpRoutes":false,"guiSecurityBanner":true,"guiSecurityBannerText":"Welcome to the BIG-IP Configuration Utility.\n\nLog in with your username and password using the fields on the left.","usernamePrompt":"Username","passwordPrompt":"Password"},"My_License":{"class":"License","licenseType":"regKey","regKey":"XXXXXX-XXXXXX-XXXXXX-XXXXX","overwrite":false},"admin":{"class":"User","userType":"regular","shell":"bash","forceInitialPasswordChange":true},"default":{"class":"ManagementRoute","gw":"10.0.0.1","network":"default","mtu":0},"dhclient_route1":{"class":"ManagementRoute","gw":"10.0.0.1","network":"168.63.129.16/32","mtu":0},"azureMetadata":{"class":"ManagementRoute","gw":"10.0.0.1","network":"169.254.169.254/32","mtu":0},"defaultRoute":{"class":"Route","gw":"10.0.1.1","network":"default","mtu":1500,"localOnly":false},"external":{"class":"VLAN","tag":4094,"mtu":1500,"interfaces":[{"name":"1.1","tagged":false}],"autoLastHop":"default","cmpHash":"default","failsafeEnabled":false,"failsafeAction":"failover-restart-tm","failsafeTimeout":90},"external-self":{"class":"SelfIp","address":"10.0.1.11/24","vlan":"external","allowService":["tcp:443","udp:1026","tcp:4353","tcp:6123","tcp:6124","tcp:6125","tcp:6126","tcp:6127","tcp:6128"],"trafficGroup":"traffic-group-local-only"}}}},"headers":{"connection":"close","date":"Thu, 25 Apr 2024 19:05:11 GMT","content-type":"application/json;charset=utf-8","pragma":"no-cache","cache-control":"no-store, no-cache, must-revalidate","expires":"-1","content-length":"2732","server":"Jetty(9.4.49.v20220914)"}}
2024-04-25T19:05:11.414Z [21044]: info: Sending F5 Teem report for failure case.
2024-04-25T19:05:13.739Z [21044]: warn: Problem with getting data from /mgmt/tm/sys/license endpoint. Leaving regKey with default value
2024-04-25T19:05:13.741Z [21044]: info: {"id":"d52402db-7aa2-834a-22bc723d2a81","product":"BIG-IP","cpuCount":8,"diskSize":86016,"memoryInMb":32176,"version":"17.1.1.1","nicCount":2,"platformId":"Z100","hostname":"bigip1","management":"10.0.0.11/24","provisionedModules":{"ltm":"nominal"},"installedPackages":{"f5-service-discovery-1.19.0-1.noarch":"1.19.0","f5-declarative-onboarding-1.43.0-5.noarch":"1.43.0","f5-cloud-failover-2.1.0-0.noarch":"2.1.0","f5-appsvcs-3.50.2-3.noarch":"3.50.2"},"environment":{"pythonVersion":"Python 2.7.5","pythonVersionDetailed":"2.7.5 (default, Nov 28 2023, 22:15:20) \n[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]","nodeVersion":"v6.9.1","libraries":{"ssh":"OpenSSH_7.4p1, OpenSSL 1.0.2za-fips  24 Aug 2021"}}}
2024-04-25T19:05:14.007Z [21044]: error: Device is not licensed yet

[stderr]
'. More information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot.  (Code: VMExtensionProvisioningError)

Steps to reproduce

  1. Go to the failover solution and use the Deploy to Azure button.
  2. I used the Inputs provided below (with some values redacted)

Important notes:

  1. Once the deployment fails, the VMs are running and the admin console is available for each via the public IP. I was able to license one of the BIG IPs manually, but then I discovered that none of the HA/failover configuration was applied.

I tried to revoke the license before deleting the resource group, but I ran into this error:

Screenshot 2024-04-25 at 11 11 25 AM

Here's the deployment template values:

{
  "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/Microsoft.Template-20240425115853",
  "name": "Microsoft.Template-20240425115853",
  "type": "Microsoft.Resources/deployments",
  "properties": {
    "templateHash": "64714859484389663",
    "parameters": {
      "templateBaseUrl": {
        "type": "String",
        "value": "https://cdn.f5.com/product/cloudsolutions/"
      },
      "allowUsageAnalytics": {
        "type": "Bool",
        "value": true
      },
      "artifactLocation": {
        "type": "String",
        "value": "f5-azure-arm-templates-v2/v3.1.0.0/examples/"
      },
      "uniqueString": {
        "type": "String",
        "value": "ss2024042503"
      },
      "bigIpHostname01": {
        "type": "String",
        "value": "failover01.local"
      },
      "bigIpHostname02": {
        "type": "String",
        "value": "failover02.local"
      },
      "bigIpImage": {
        "type": "String",
        "value": "f5-networks:f5-big-ip-byol:f5-big-all-2slot-byol:17.1.101000"
      },
      "bigIpInstanceType": {
        "type": "String",
        "value": "Standard_D8s_v4"
      },
      "bigIpLicenseKey01": {
        "type": "String",
        "value": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXXXX"
      },
      "bigIpLicenseKey02": {
        "type": "String",
        "value": "XXXXX-XXXXX-XXXXX-XXXXX-XXXXXXX"
      },
      "bigIpPasswordSecretId": {
        "type": "String",
        "value": ""
      },
      "bigIpPasswordSecretValue": {
        "type": "SecureString"
      },
      "bigIpPeerAddr": {
        "type": "String",
        "value": "10.0.1.11"
      },
      "sshKey": {
        "type": "String",
        "value": "ssh-rsa AAAAB3xxxxxxxxxxx"
      },
      "appContainerName": {
        "type": "String",
        "value": ""
      },
      "provisionExampleApp": {
        "type": "Bool",
        "value": false
      },
      "provisionPublicIpMgmt": {
        "type": "Bool",
        "value": true
      },
      "restrictedSrcAddressMgmt": {
        "type": "String",
        "value": "*"
      },
      "restrictedSrcAddressApp": {
        "type": "String",
        "value": "*"
      },
      "restrictedSrcAddressVip": {
        "type": "String",
        "value": "*"
      },
      "numNics": {
        "type": "Int",
        "value": 2
      },
      "bigIpExternalSelfIp01": {
        "type": "String",
        "value": "10.0.1.11"
      },
      "bigIpExternalSelfIp02": {
        "type": "String",
        "value": "10.0.1.12"
      },
      "bigIpExternalVip01": {
        "type": "String",
        "value": "10.0.1.101"
      },
      "bigIpInternalSelfIp01": {
        "type": "String",
        "value": "10.0.2.11"
      },
      "bigIpInternalSelfIp02": {
        "type": "String",
        "value": "10.0.2.12"
      },
      "bigIpMgmtAddress01": {
        "type": "String",
        "value": "10.0.0.11"
      },
      "bigIpMgmtAddress02": {
        "type": "String",
        "value": "10.0.0.12"
      },
      "bigIpRuntimeInitConfig01": {
        "type": "String",
        "value": "https://raw.githubusercontent.com/F5Networks/f5-azure-arm-templates-v2/v3.1.0.0/examples/failover/bigip-configurations/runtime-init-conf-2nic-byol-instance01.yaml"
      },
      "bigIpRuntimeInitConfig02": {
        "type": "String",
        "value": "https://raw.githubusercontent.com/F5Networks/f5-azure-arm-templates-v2/v3.1.0.0/examples/failover/bigip-configurations/runtime-init-conf-2nic-byol-instance02.yaml"
      },
      "bigIpRuntimeInitPackageUrl": {
        "type": "String",
        "value": "https://cdn.f5.com/product/cloudsolutions/f5-bigip-runtime-init/v2.0.1/dist/f5-bigip-runtime-init-2.0.1-1.gz.run"
      },
      "cfeStorageAccountName": {
        "type": "String",
        "value": ""
      },
      "cfeTag": {
        "type": "String",
        "value": "bigip_high_availability_solution"
      },
      "useAvailabilityZones": {
        "type": "Bool",
        "value": false
      },
      "bigIpUserAssignManagedIdentity": {
        "type": "String",
        "value": ""
      },
      "tagValues": {
        "type": "Object",
        "value": {
          "application": "f5demoapp",
          "cost": "f5cost",
          "environment": "f5env",
          "group": "f5group",
          "owner": "f5owner"
        }
      }
    },
    "mode": "Incremental",
    "debugSetting": {
      "detailLevel": "None"
    },
    "provisioningState": "Failed",
    "timestamp": "2024-04-25T19:06:06.3848189Z",
    "duration": "PT7M9.2923101S",
    "correlationId": "c5cb1a12-99ad-455f-82b2-e8ae9f34dd1a",
    "providers": [
      {
        "namespace": "Microsoft.Resources",
        "resourceTypes": [
          {
            "resourceType": "deployments",
            "locations": [
              null
            ]
          }
        ]
      }
    ],
    "dependencies": [
      {
        "dependsOn": [
          {
            "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/networkTemplate",
            "resourceType": "Microsoft.Resources/deployments",
            "resourceName": "networkTemplate"
          }
        ],
        "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/dagTemplate",
        "resourceType": "Microsoft.Resources/deployments",
        "resourceName": "dagTemplate"
      },
      {
        "dependsOn": [
          {
            "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/networkTemplate",
            "resourceType": "Microsoft.Resources/deployments",
            "resourceName": "networkTemplate"
          },
          {
            "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/accessTemplate",
            "resourceType": "Microsoft.Resources/deployments",
            "resourceName": "accessTemplate"
          },
          {
            "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/dagTemplate",
            "resourceType": "Microsoft.Resources/deployments",
            "resourceName": "dagTemplate"
          }
        ],
        "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/bigIpTemplate01",
        "resourceType": "Microsoft.Resources/deployments",
        "resourceName": "bigIpTemplate01"
      },
      {
        "dependsOn": [
          {
            "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/networkTemplate",
            "resourceType": "Microsoft.Resources/deployments",
            "resourceName": "networkTemplate"
          },
          {
            "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/accessTemplate",
            "resourceType": "Microsoft.Resources/deployments",
            "resourceName": "accessTemplate"
          },
          {
            "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/dagTemplate",
            "resourceType": "Microsoft.Resources/deployments",
            "resourceName": "dagTemplate"
          }
        ],
        "id": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/bigIpTemplate02",
        "resourceType": "Microsoft.Resources/deployments",
        "resourceName": "bigIpTemplate02"
      }
    ],
    "error": {
      "code": "DeploymentFailed",
      "target": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/Microsoft.Template-20240425115853",
      "message": "At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.",
      "details": [
        {
          "code": "ResourceDeploymentFailure",
          "target": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/bigIpTemplate01",
          "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
        },
        {
          "code": "ResourceDeploymentFailure",
          "target": "/subscriptions/xxxxx-xxxxx-xxxx-xxxxxx-xxxxxxx/resourceGroups/f5-big-ip/providers/Microsoft.Resources/deployments/bigIpTemplate02",
          "message": "The resource write operation failed to complete successfully, because it reached terminal provisioning state 'Failed'."
        }
      ]
    },
    "validationLevel": "Template"
  },
  "tags": {
    "marketplaceItemId": "Microsoft.Template"
  }
}

Screenshots

Screenshot 2024-04-25 at 11 49 12 AM
mikeshimkus commented 4 months ago

Hi @tashian, can your BIG-IPs reach the public F5 license server? Are you using a proxy? See Error licensing: tryUntil: max tries reached: Unknown exception during ping for more info on that error.

tashian commented 4 months ago

Thansk @mikeshimkus for the suggestions. I'm not using a proxy, so I don't think the KB article is relevant. I've just created a new resource group from scratch, and deployed into that. Once the deployment fails, I'm able to sign in from the public internet and license the instance successfully.

I don't know exactly what is happening under the hood with Azure during the deployment, but to me this feels like a race condition. Just a hunch. 🤷🏻

mikeshimkus commented 4 months ago

Still seems to be a connectivity issue to the license server, which is odd since you were able to download the AT packages and config files and later you could reach the license server. I created task EC-510 and will see if I can repro, but it may end up being a Delclarative Onboarding issue since it seems to happen in the middle of DO configuration.

mikeshimkus commented 4 months ago

@tashian I was able to eliminate the templates as a cause by deploying the same BYOL image directly from the marketplace and attempting to license via tmsh:

azureuser@(localhost)(cfg-sync Standalone)(NO LICENSE)(/Common)(tmos)# install sys license registration-key XXXXX-XXXXX-XXXXX-XXXXX-XXXXXXX Warning: the current license is not valid Unknown exception during ping ://:8080

Please open a ticket with F5 support. I was not able to find an existing bug for this issue. Thanks

tashian commented 4 months ago

Hi Mike, thanks for the repro. We (Smallstep) are F5 TAP program partners, and I'm unable to create a support ticket with my F5 login:

Screenshot 2024-05-01 at 4 08 57 PM

I've sent an email to our TAP contact at F5 to let them know about the issue.

Thanks