F5Networks / f5-cloud-failover-extension

F5 Cloud Failover Extension
Apache License 2.0
15 stars 2 forks source link

CFE error when failing over EIP's between AZ's: Cannot read property 'NetworkInterfaceId' of undefined #116

Closed mikeoleary closed 1 year ago

mikeoleary commented 1 year ago

Do you already have an issue opened with F5 support?

Customer has been working closely with myself and his SE for over 1 month on this.

Description

I have tried everything I can think of but cannot get around this error. Customer has the following requirements

Route table updates are working if they are the only thing that is configured in CFE config. But when we add the section for Address failover, we get the error:

Wed, 08 Feb 2023 20:33:29 GMT - fine: [f5-cloud-failover] Generated Address Operations {"disassociate":[],"associate":[]}
Wed, 08 Feb 2023 20:33:29 GMT - finest: [f5-cloud-failover] Next hop address: 10.61.161.4
Wed, 08 Feb 2023 20:33:29 GMT - fine: [f5-cloud-failover] Using IPv4 filtering for to get nics 10.61.161.4
Wed, 08 Feb 2023 20:33:29 GMT - finest: [f5-cloud-failover] Next hop address: 10.61.161.4
Wed, 08 Feb 2023 20:33:29 GMT - fine: [f5-cloud-failover] Using IPv4 filtering for to get nics 10.61.161.4
Wed, 08 Feb 2023 20:33:29 GMT - severe: [f5-cloud-failover] error in _getFailoverDiscovery: TypeError: Cannot read property 'NetworkInterfaceId' of undefined
Wed, 08 Feb 2023 20:33:29 GMT - severe: [f5-cloud-failover] Cannot read property 'NetworkInterfaceId' of undefined TypeError: Cannot read property 'NetworkInterfaceId' of undefined
    at _listNics.then (/var/config/rest/iapps/f5-cloud-failover/nodejs/providers/aws/cloud.js:718:53)
    at tryCatcher (/usr/share/rest/node/node_modules/bluebird/js/release/util.js:16:23)
    at Promise._settlePromiseFromHandler (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:512:31)
    at Promise._settlePromise (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:569:18)
    at Promise._settlePromise0 (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:614:10)
    at Promise._settlePromises (/usr/share/rest/node/node_modules/bluebird/js/release/promise.js:693:18)
    at Async._drainQueue (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:133:16)
    at Async._drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:143:10)
    at Immediate.Async.drainQueues (/usr/share/rest/node/node_modules/bluebird/js/release/async.js:17:14)
    at runCallback (timers.js:794:20)
    at tryOnImmediate (timers.js:752:5)
    at processImmediate [as _immediateCallback] (timers.js:729:5)
Wed, 08 Feb 2023 20:33:29 GMT - finest: [f5-cloud-failover] Uploading data to: f5cloudfailover/f5cloudfailoverstate.json

Environment information

CFE version 1.13 AWS environment (East-US-2 , 2xAZs) BIG-IP 16.x

Severity Level

2 (High). I have been working with this customer for over a month and they are now up against a deadline to have failover address working. I will need some support with this.

G-gonzalezjimenez commented 1 year ago

Hi Michael, looking into this. We will keep you posted.

mikeoleary commented 1 year ago

@G-gonzalezjimenez , thank you. Shyawn also reached out via Teams. He gave me an idea to troubleshoot with the customer. I will do that before I come to you again. I only submitted this issue as a last resort and don't want to waste your time, so let me get back to you on this.

mikeoleary commented 1 year ago

@G-gonzalezjimenez , update below and a couple ideas for PM from the customer before we close.

FYI - the customer had added a 4th ENI to both devices after CFT deployment, and had configured an AWS Route to failover between these 2 ENI's. They had not correctly tagged these ENI's. Once tagged correctly, issue was resolved. But this raised a couple interesting ideas from customer.

Explanation: this was missed because if failing over Routes only between 2x ENI's, the CFE failover worked fine even when the ENI's were not tagged. Also, if failover over EIP's only between 2 ENI's, the CFE failover still worked fine, because the external ENI's that were in scope were correctly configured.

It was only when the declaration included both Routes and ENI's that the customer saw unsuccessful failovers. We now understand this is because the tags on the 4th, additional, ENI's are required for Address failover, even if the Address failover is done on External ENI's, because the IP addresses of these 4th ENI's are referenced in the CFE config Route failover section.

Ideas for PM from customer

  1. Please note that the logging section of docs mention 6 levels of verbosity but only describes 3 of them. It's unclear which is most verbose.
  2. Customer mentioned it could be helpful to have instructions if adding a 4th ENI after deploying the CFT.
  3. Customer mentioned it could be useful to know that the default IAM role created by the CFT includes a condition for Route failover, where a tag must exist. For new routes or other resources added, a customer will likely receive a permissions error if they do not update this IAM role. Example condition here

Many thanks to PM guys btw, I reached out as a last resort but should have found this myself. Mike.

shyawnkarim commented 1 year ago

Closing.