Azure / AzLoadBalancerMigration

This repo contains a PowerShell module to support Azure Load Balancer migration from Basic to Standard SKU
MIT License
10 stars 8 forks source link

🪲 Bug Report - Upgrade Fails for SF VMSS with KeyVault Extension #65

Open mbrat2005 opened 1 year ago

mbrat2005 commented 1 year ago

Describe the bug

Upgrade fails due to KV VM extension timeout:

2023-06-14T21:07:57+00 [Information]:############################## Initializing Start-AzBasicLoadBalancerUpgrade ##############################
2023-06-14T21:07:57+00 [Information]:[Start-AzBasicLoadBalancerUpgrade] PowerShell Version: **7.3.4**
2023-06-14T21:07:57+00 [Information]:[Start-AzBasicLoadBalancerUpgrade] AzureBasicLoadBalancerUpgrade **Version: 2.0.19**
...
2023-06-14T21:08:00+00 [Information]:[Test-SupportedMigrationScenario] Checking whether VMSS scale set 'quotavmssdevbn' is a Service Fabric cluster...
WARNING: 2023-06-14T21:08:00+00 [Warning]:[Test-SupportedMigrationScenario] **VMSS appears to be a Service Fabric** cluster based on extension profile. SF Clusters experienced potentically significant downtime during migration using this PowerShell module. In testing, a 5-node Bronze cluster was unavailable for about 30 minutes and a 5-node Silver cluster was unavailabile for about 45 minutes. Shutting down the cluster VMSS prior to initiating migration will result in a more consistent experience of about 5 minutes to complete the LB migration. For Service Fabric clusters that require minimal / no connectivity downtime, adding a new nodetype with standard load balancer and IP resources is a better solution.
Do you want to proceed with the migration of your Service Fabric Cluster's Load Balancer?
...
2023-06-14T21:08:00+00 [Information]:[PublicLBMigration] **Public Load Balancer Detected**. Initiating Public Load Balancer Migration
...
2023-06-14T21:24:34+00 [Information]:[NatRulesMigration] Waiting for saving standard load balancer LB-quota-cluster-dev-bn job to complete...
2023-06-14T21:24:34+00 [Information]:[NatRulesMigration] Nat Rules Migration Completed
2023-06-14T21:24:34+00 [Information]:[InboundNatPoolsMigration] Initiating Inbound NAT Pools Migration
2023-06-14T21:24:34+00 [Information]:[InboundNatPoolsMigration] Adding Inbound NAT Pool LoadBalancerBEAddressNatPool to Standard Load Balancer
2023-06-14T21:24:34+00 [Information]:[InboundNatPoolsMigration] Saving Standard Load Balancer LB-quota-cluster-dev-bn
2023-06-14T21:24:49+00 [Information]:[InboundNatPoolsMigration] Waiting for saving standard load balancer LB-quota-cluster-dev-bn job to complete...
2023-06-14T21:24:49+00 [Information]:[GetVmssFromBasicLoadBalancer] Initiating GetVmssFromBasicLoadBalancer
2023-06-14T21:24:49+00 [Information]:[GetVmssFromBasicLoadBalancer] Getting VMSS object '/subscriptions/.../resourcegroups/azure-quota-dev-eastus2/providers/microsoft.compute/virtualmachinescalesets/quotavmssdevbn' from Azure
2023-06-14T21:24:49+00 [Information]:[GetVmssFromBasicLoadBalancer] VMSS loaded Name quotavmssdevbn from RG azure-quota-dev-eastus2
2023-06-14T21:24:49+00 [Information]:[_MigrateNetworkInterfaceConfigurations] Adding InboundNATPool to VMSS quotavmssdevbn
2023-06-14T21:24:49+00 [Information]:[_MigrateNetworkInterfaceConfigurations] Checking if VMSS 'quotavmssdevbn' NIC 'NIC-azure-quota-dev-eastus2' IPConfig 'NIC-azure-quota-dev-eastus2' should be associated with NAT Pool 'LoadBalancerBEAddressNatPool'
2023-06-14T21:24:49+00 [Information]:[_MigrateNetworkInterfaceConfigurations] Adding NAT Pool 'LoadBalancerBEAddressNatPool' to IPConfig 'NIC-azure-quota-dev-eastus2'
2023-06-14T21:24:49+00 [Information]:[_MigrateNetworkInterfaceConfigurations] Migrate NetworkInterface Configurations completed
2023-06-14T21:24:49+00 [Information]:[InboundNatPoolsMigration] Saving VMSS quotavmssdevbn
2023-06-14T21:24:49+00 [Information]:[UpdateVmss] Updating configuration of VMSS 'quotavmssdevbn'
2023-06-14T21:25:04+00 [Information]:[UpdateVmss] Waiting for job (id: '5') updating VMSS 'quotavmssdevbn' to complete...
...
2023-06-14T23:10:50+00 [Information]:[UpdateVmss] Waiting for job (id: '5') updating VMSS 'quotavmssdevbn' to complete...
InvalidOperation: Long running operation failed with status 'Failed'. Additional Info:'Provisioning of VM extension **KvVmExtension** has timed out. Extension provisioning has taken too long to complete. The extension did not report a message. More information on troubleshooting is available at https://aka.ms/vmextensionwindowstroubleshoot'
ErrorCode: VMExtensionProvisioningTimeout
ErrorMessage: Provisioning of VM extension KvVmExtension has timed out. Extension provisioning has taken too long to complete. The extension did not report a message. More information on troubleshooting is available at https://aka.ms/vmextensionwindowstroubleshoot
ErrorTarget: 0
StartTime: 6/14/2023 9:24:52 PM
EndTime: 6/14/2023 11:10:27 PM
OperationID: 85ee53b5-9ce3-4458-9edd-f46e8c7baf02
Status: Failed
Write-Error: 2023-06-14T23:10:50+00 [Error]:[InboundNatPoolsMigration] An error occured when attempting to update VMSS network config on the new Standard LB backend pool membership. To recover address
the following error, and try again specifying the -FailedMigrationRetryFilePath parameter and Basic Load Balancer backup State file located either in this directory or the directory
specified with -RecoveryBackupPath

To Reproduce

Steps to reproduce the behavior:

  1. VMSS
  2. Public LB
  3. KVVMExtension [this case, extension adds a cert to local store, auto upgrade disabled]
  4. SF Cluster [?]

Additional context - please include:

See log

mbrat2005 commented 1 year ago

This issue is reportedly intermittent...still working to repro

mbrat2005 commented 6 months ago

Closing due to lack of activity and reproducibility

AndrewCS149 commented 3 weeks ago

@mbrat2005 Im experiencing the same issue. Did you ever find a solution?

mbrat2005 commented 3 weeks ago

Hi Andrew,

I haven't made progress on this one, since I couldn't seem to repro it. Would you be able to share your upgrade log for details? Also, are you upgrading a basic LB for a Service Fabric Cluster?

Thanks! Matthew


From: Andrew Smith @.> Sent: Wednesday, July 3, 2024 13:29 To: Azure/AzLoadBalancerMigration @.> Cc: Mention @.>; Author @.>; Comment @.>; Assign @.>; State change @.***> Subject: Re: [Azure/AzLoadBalancerMigration] 🪲 Bug Report - Upgrade Fails for SF VMSS with KeyVault Extension (Issue #65)

@mbrat2005https://github.com/mbrat2005 Im experiencing the same issue. Did you ever find a solution?

— Reply to this email directly, view it on GitHubhttps://github.com/Azure/AzLoadBalancerMigration/issues/65#issuecomment-2207038571 or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGBW6WHTSMOHXES4YPWH4F3ZKRGLFBFKMF2HI4TJMJ2XIZLTSWBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVI2DKNRTHEYTONZQHGSG4YLNMWUWQYLTL5WGCYTFNSBKK5TBNR2WLKRVGU2DSOJZGYZDQM5ENZQW2ZNJNBQXGX3MMFRGK3FMON2WE2TFMN2F65DZOBS2YSLTON2WKQ3PNVWWK3TUUZ2G64DJMNZZJAVEOR4XAZNKOJSXA33TNF2G64TZUV3GC3DVMWUTKMZZHE4TSOJYG6BKI5DZOBS2K2LTON2WLJLWMFWHKZNKGE3TKOJTGQZDEMZQQKSHI6LQMWSWYYLCMVWKK5TBNR2WLKRUGU3DGOJRG43TAOMCUR2HS4DFUVWGCYTFNSSXMYLMOVS2UNJVGQ4TSOJWGI4DHJ3UOJUWOZ3FOKTGG4TFMF2GK. You are receiving this email because you were mentioned.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mbrat2005 commented 2 weeks ago

@AndrewCS149 I haven't made progress on this one, since I couldn't seem to repro it. Would you be able to share your upgrade log for details? Also, are you upgrading a basic LB for a Service Fabric Cluster?