F5Networks / f5-cloud-failover-extension

F5 Cloud Failover Extension
Apache License 2.0
15 stars 2 forks source link

Azure: CFE deletes ipconfigs but fails to create ipconfigs if Public IPs exist in another Resource Group #42

Closed mikeoleary closed 4 years ago

mikeoleary commented 4 years ago

Issue Description

My customer found that Ipconfigs were successfully deleted from Device1 but failed to be created on Device2 due to permissions errors. CFE logs below.

Customer deployed a supported ARM template, then created additional VIPs using public IP addresses from a different RG.

Since the Managed Identity for the VM's is only permissioned at the RG in which the BIG-IP is deployed, it did not have write permissions over the public IP.

I noticed that in issue #31 there was a reference to AUTOSDK-376 that would target a feature that might check for missing dependencies. My main question here is:

a) could we have some kind of permissions check prior to, or at the time of failover, so that we can avoid hitting permission errors after the ipconfigs have been successfully deleted but before they are created on the other device, or

b) could we document the requirement that public IP's must be in the appropriate RG's or have appropriate permissions in place in order to failover correctly? Currently there is a reference to RG's and permissions in the FAQ but it may help to call out public IP's as these may get created much later than BIG-IPs, potentially by different teams.

Workaround

  1. I asked customer to delete public IP and recreate in RG in which ManagedIdentity is a Contributor.
  2. I also suggested customer could give ManagedIdentity Contributor permissions in all other RG's where PublicIP's may get created

Steps to Recreate

  1. Deploy supported HA ARM template, failover via API.
  2. After deployment, create a new VIP. Allocate a private IP and then associate a Public IP that already exists in a different RG from where the BIG-IP was deployed.

CFE log file

I can provide the full log file but I have copied the lines from this failover event below, and manually removed any GUID's and object names:

Fri, 31 Jul 2020 04:25:57 GMT - finest: socket 206 opened
Fri, 31 Jul 2020 04:26:02 GMT - info: [f5-cloud-failover] Performing failover - execute
Fri, 31 Jul 2020 04:26:05 GMT - info: [f5-cloud-failover] Performing Failover - discovery
Fri, 31 Jul 2020 04:26:05 GMT - info: [f5-cloud-failover] Discover Address operations using localAddresses {"0":"x.x.x.x","1":"x.x.x.x"} failoverAddresses {"0":"x.x.x.x","1":"x.x.x.x"} to discover
Fri, 31 Jul 2020 04:26:05 GMT - info: [f5-cloud-failover] Performing Failover - update
Fri, 31 Jul 2020 04:26:57 GMT - finest: socket 206 closed
Fri, 31 Jul 2020 04:26:58 GMT - finest: socket 207 opened
Fri, 31 Jul 2020 04:27:04 GMT - finest: socket 207 closed
Fri, 31 Jul 2020 04:30:37 GMT - info: [f5-cloud-failover] Disassociate NICs successful.
Fri, 31 Jul 2020 04:34:01 GMT - severe: [f5-cloud-failover] The client 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' has permission to perform action 'Microsoft.Network/networkInterfaces/write' on scope '[ResourceId of NIC]'; however, it does not have permission to perform action 'Microsoft.Network/publicIPAddresses/join/action' on the linked scope(s) '[ResourceId of PublicIP1],[ResourceId of PublicIP2]' or the linked scope(s) are invalid. Error: The client 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' with object id 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' has permission to perform action 'Microsoft.Network/networkInterfaces/write' on scope '[ResourceId of NIC]'; however, it does not have permission to perform action 'Microsoft.Network/publicIPAddresses/join/action' on the linked scope(s) '[ResourceId of PublicIP1],[ResourceId of PublicIP2]' or the linked scope(s) are invalid.
    at client.pipeline (/var/config/rest/iapps/f5-cloud-failover/node_modules/azure-arm-network/lib/operations/networkInterfaces.js:2037:19)
    at retryCallback (/var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/filters/systemErrorRetryPolicyFilter.js:89:9)
    at retryCallback (/var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/filters/exponentialRetryPolicyFilter.js:140:9)
    at /var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/filters/rpRegistrationFilter.js:59:14
    at handleRedirect (/var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/filters/redirectFilter.js:39:9)
    at /var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/filters/formDataFilter.js:23:14
    at Request.defaultRequest [as _callback] (/var/config/rest/iapps/f5-cloud-failover/node_modules/ms-rest-azure/node_modules/ms-rest/lib/requestPipeline.js:125:16)
    at Request.self.callback (/var/config/rest/iapps/f5-cloud-failover/node_modules/request/request.js:185:22)
    at emitTwo (events.js:126:13)
    at Request.emit (events.js:214:7)
    at Request.<anonymous> (/var/config/rest/iapps/f5-cloud-failover/node_modules/request/request.js:1154:10)
    at emitOne (events.js:121:20)
    at Request.emit (events.js:211:7)
    at IncomingMessage.<anonymous> (/var/config/rest/iapps/f5-cloud-failover/node_modules/request/request.js:1076:12)
    at Object.onceWrapper (events.js:313:30)
    at emitNone (events.js:111:20)
    at IncomingMessage.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1064:12)
    at _combinedTickCallback (internal/process/next_tick.js:138:11)
    at process._tickCallback (internal/process/next_tick.js:180:9)
Fri, 31 Jul 2020 19:18:24 GMT - finest: socket 208 opened
KrithikaChidambaram commented 4 years ago

Hi, AUTOSDK-431 has been created to track this.

KrithikaChidambaram commented 4 years ago

Hi Mike,

For your questions: a) we are tracking the check or dry-run RFE with internal ID AUTOSDK-376
b) The documentation states "Certain resources such as the virtual network are commonly deployed in a seperate resource group, ensure the correct scopes are applied to all applicable resource groups." https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/azure.html#rbac-role-definition

If you don't have any other concern, I shall close this item.

mikeoleary commented 4 years ago

Hi Krithika, OK, thanks, I appreciate that and closing this issue sounds good to me. Mike

jmcalalang commented 4 years ago

did this MSI get updated? I think that I'm running into the same scenario, but with a UDR that exists in another resource group

jmcalalang commented 4 years ago

The answer is yes, this issue has the same rights problems with a UDR in another route, as a workaround add the Virtual Machine created MSI into the other RG for access to update a UDR.

for SEO, Route Table updates in different Resource Groups