datacenter / ACI-Pre-Upgrade-Validation-Script

A script to run validations to detect potential issues that may cause an ACI fabric upgrade to fail
https://datacenter.github.io/ACI-Pre-Upgrade-Validation-Script/
Apache License 2.0
42 stars 27 forks source link

NewValidation: Check if AEP includes a VMM and physical domain #127

Open markrakozy opened 5 months ago

markrakozy commented 5 months ago

(use upvote :thumbsup: for attentions)

Validation Type

[ x] - Fault (F0467 port-configured-as-l2)

[ ] - Config

[ ] - Bug

[ ] - Other

What needs to be validated

The script already detects F0467 port-configured-as-l2. However, it appears to only focus on situations where a static binding (with a physical domain) has been configured on a port that is used for an L3Out. We have observed at least two situations where ACI admins have combined a VMM domain in the same AEP as the L3 Domain. If this is done AND if the customer inadvertently bound the VMM domain to an EPG using resolution immediacy = "Pre-provision", then it is possible that the L2 config can be programmed onto the port and can block the L3 configuration. This appears to be a question of timing, so the problem can be intermittent and only manifest itself after a reboot. If ACI detects it, then a F0467 fault is indeed generated, but the damage is done.

For a config validation, describe the exact configuration to be validated Test for AEP that includes both a L3 Domain (access policy) as well as a VMM. Even better, check to see if there are also EPGs with resolution policy = Pre-provision.

Why it needs to be validated

The best practice, of course, is that the VMM domain(s) is/are in a separate AEP.

Additional context

Add any other context about the feature request here.

monrog2 commented 5 months ago

@markrakozy As of v2.0.0, the script checks for both port-configured-as-l2 as well as port-configured-as-l3 F0467 faults via 2 different checks.

Clarifications Needed

  1. In your problem scenario where the L3 config is first and VMM is pushed after, were there not any port-configured-as-l3 faults?
  2. In that same scenario, the port-configured-as-l2 fault was only seen after the clean reload/upgrade allowed the VMM config to be pushed before the L3out?
  3. Do you have a setup/config export that has the config you mention that does not flag the port-configured-as-l3 fault where the L3 config made it to the leaf before the VMM one did, but puts the fabric into the problem scenario you mention?
  4. Were the L3 ports routed ports or SVIs?

Comment

A given Policy Group can only be tied to a single AEP, and we cannot hard enforce that VMM must always be in a separate AEP from L3 Domains. This is especially true for the Floating L3out Configuration using Virtual Routers via VMM Domains:

"Use of an L3Out domain and a VMM domain for a floating L3Out below illustrates an example of an AEP that has an L3Out domain and a VMM domain for a floating L3Out."

markrakozy commented 5 months ago

"In your problem scenario where the L3 config is first and VMM is pushed after, were there not any port-configured-as-l3 faults?" No. The pre-upgrade script was run and no faults were present. It appears that the fault can emerge later, say at reboot.

"In that same scenario, the port-configured-as-l2 fault was only seen after the clean reload/upgrade allowed the VMM config to be pushed before the L3out?" Apparently, yes.

Do you have a setup/config export that has the config you mention that does not flag the port-configured-as-l3 fault where the L3 config made it to the leaf before the VMM one did, but puts the fabric into the problem scenario you mention? The issue is that the VMM config apparently made it to the leaf first and thus prevented the L3out programming. I do have access to the config, but I cannot share it as it contains customer information.

"Were the L3 ports routed ports or SVIs?" L3 Routed

Comment A given Policy Group can only be tied to a single AEP, and we cannot hard enforce that VMM must always be in a separate AEP from L3 Domains. This is especially true for the Floating L3out Configuration using Virtual Routers via VMM Domains:

"Use of an L3Out domain and a VMM domain for a floating L3Out below illustrates an example of an AEP that has an L3Out domain and a VMM domain for a floating L3Out."

I agree. However, I work with 40-50 ACI customers and have yet to encounter one who uses the floating VMM feature. I believe it is an elegant feature, but it has limited customer adoption. Instead, for most customers, it still makes sense separate out the VMM AEP. We could provide a warning/message if we see the L3OUt and VMM domains mapped to the same AEP. Or, we could only flag those situations where we see that there is also an EPG with resolution immediacy = pre-provision.