CiscoDevNet / terraform-provider-iosxe

Terraform Cisco IOS-XE Provider
https://registry.terraform.io/providers/CiscoDevNet/iosxe
Mozilla Public License 2.0
59 stars 22 forks source link

Resource iosxe_access_list_extended - not implementing changes to sequence order and inaccurate state #156

Open yamabay opened 6 months ago

yamabay commented 6 months ago

Overview

The iosxe_access_list_extended resource does not configure an ACL correctly once modifications to the ACL sequence numbers are made. When the sequence of statements are reordered, Terraform reflect these changes in the Terraform plan. However, once the changes are applied, the ACL will be missing statements. Subsequent runs of terraform plan will not detect this drift and will report that no changes need to made. Terraform believes the live configurations are accurately matching the declarative state which is not accurate.

A terraform plan -refresh-only also does not detect the mismatch between the live configurations and the state.

Test Case

This bug can be replicated by following the steps list below.

  1. First, we deploy a basic extended ACL with three statements:
resource "iosxe_access_list_extended" "bug_test" {
    name = "bug-test-acl"
    entries = [
    {
      sequence                    = 10
      ace_rule_action             = "permit"
      ace_rule_protocol           = "tcp"
      source_prefix               = "10.0.0.0"
      source_prefix_mask          = "0.0.0.255"
      destination_prefix          = "10.0.0.0"
      destination_prefix_mask     = "0.0.0.255"
      destination_port_equal      = "22"
    },
    {
      sequence                    = 20
      ace_rule_action             = "permit"
      ace_rule_protocol           = "tcp"
      source_prefix               = "11.0.0.0"
      source_prefix_mask          = "0.0.0.255"
      destination_prefix          = "10.0.0.0"
      destination_prefix_mask     = "0.0.0.255"
      destination_port_equal      = "22"
    },
    {
      sequence                    = 30
      ace_rule_action             = "permit"
      ace_rule_protocol           = "tcp"
      source_prefix               = "12.0.0.0"
      source_prefix_mask          = "0.0.0.255"
      destination_prefix          = "10.0.0.0"
      destination_prefix_mask     = "0.0.0.255"
      destination_port_equal      = "22"
    }
    ]
}

Once applied, the bug-test-acl ACL should now exist on the router:

bug-test-rtr#show ip access-list | s bug-test
Extended IP access list bug-test-acl
    10 permit tcp 10.0.0.0 0.0.0.255 10.0.0.0 0.0.0.255 eq 22
    20 permit tcp 11.0.0.0 0.0.0.255 10.0.0.0 0.0.0.255 eq 22
    30 permit tcp 12.0.0.0 0.0.0.255 10.0.0.0 0.0.0.255 eq 22
  1. The easiest way to simulate the bug is to change the sequence numbers between statements as follows:
    resource "iosxe_access_list_extended" "bug_test" {
    name = "bug-test-acl"
    entries = [
    {
      sequence                    = 10
      ace_rule_action             = "permit"
      ace_rule_protocol           = "tcp"
      source_prefix               = "10.0.0.0"
      source_prefix_mask          = "0.0.0.255"
      destination_prefix          = "10.0.0.0"
      destination_prefix_mask     = "0.0.0.255"
      destination_port_equal      = "22"
    },
    {
      sequence                    = 30
      ace_rule_action             = "permit"
      ace_rule_protocol           = "tcp"
      source_prefix               = "11.0.0.0"
      source_prefix_mask          = "0.0.0.255"
      destination_prefix          = "10.0.0.0"
      destination_prefix_mask     = "0.0.0.255"
      destination_port_equal      = "22"
    },
    {
      sequence                    = 20
      ace_rule_action             = "permit"
      ace_rule_protocol           = "tcp"
      source_prefix               = "12.0.0.0"
      source_prefix_mask          = "0.0.0.255"
      destination_prefix          = "10.0.0.0"
      destination_prefix_mask     = "0.0.0.255"
      destination_port_equal      = "22"
    }
    ]
    }

    Note: The sequence number for the second and third statement have been swapped.

When the terraform plan is ran, the only change detected is the sequence number. However, these statements have different configurations so the entire statement should be recognized as a change.

Here is an example output of the terraform plan:

  # iosxe_access_list_extended.bug_test will be updated in-place
  ~ resource "iosxe_access_list_extended" "bug_test" {
      ~ entries = [
          ~ {
              ~ sequence                = 20 -> 30
                # (7 unchanged attributes hidden)
            },
          ~ {
              ~ sequence                = 30 -> 20
                # (7 unchanged attributes hidden)
            },
            # (1 unchanged element hidden)
        ]
        id      = "Cisco-IOS-XE-native:native/ip/access-list/Cisco-IOS-XE-acl:extended=bug-test-acl"
        name    = "bug-test-acl"
    }

Once this is applied, the live configuration of the ACL on the router looks like this:

bug-test-rtr#show ip access-list | s bug-test
Extended IP access list bug-test-acl
    10 permit tcp 10.0.0.0 0.0.0.255 10.0.0.0 0.0.0.255 eq 22
    30 permit tcp 11.0.0.0 0.0.0.255 10.0.0.0 0.0.0.255 eq 22

The debug of the terraform apply shows the payload of the PATCH has the appropriate configurations that reflect the desired configurations in the resource:

DEBUG] HTTP Request: PATCH, https://{IP-REMOVED}/restconf/data/Cisco-IOS-XE-native:native/ip/access-list/Cisco-IOS-XE-acl:extended, {{"Cisco-IOS-XE-acl:extended":{"name":"bug-test-acl","access-list-seq-rule":[{"sequence":"10","ace-rule":{"action":"permit","protocol":"tcp","ipv4-address":"10.0.0.0","mask":"0.0.0.255","dest-ipv4-address":"10.0.0.0","dest-mask":"0.0.0.255","dst-eq":"22"}},{"sequence":"30","ace-rule":{"action":"permit","protocol":"tcp","ipv4-address":"11.0.0.0","mask":"0.0.0.255","dest-ipv4-address":"10.0.0.0","dest-mask":"0.0.0.255","dst-eq":"22"}},{"sequence":"20","ace-rule":{"action":"permit","protocol":"tcp","ipv4-address":"12.0.0.0","mask":"0.0.0.255","dest-ipv4-address":"10.0.0.0","dest-mask":"0.0.0.255","dst-eq":"22"}}]}}}

Note: The resource can be removed and recreated without any issues even though the internal configurations do not match the configurations in the state.

Other Scenarios

I have tested adding and removing statements to an ACL and that appears to work successfully. The bug appears to occur only when sequences are changes for existing statements.

yamabay commented 6 months ago

Complex Test Case

I wanted to provide better context of the exact code I am trying to implement to assist in determining if this is a bug or simply bad code I am writing!

I am using a YAML file that define the ACL. The YAML file is converted to a Terraform object using the yamldecode function. Once this is done, a for loop is used within the entries argument of the extended ACL resource to loop through each statement within the YAML file for the ACL.

An example YAML file looks like this:

acls:
  YAML_ACL:
   - action: "permit"
     protocol: "udp"
     source: "any"
     destination: "1.1.1.1/32"
     dest_port: "53"

   - action: "permit"
     protocol: "udp"
     source: "any"
     destination: "1.1.1.2/32"
     dest_port: "53"

   - action: "permit"
     protocol: "udp"
     source: "any"
     destination: "1.1.1.3/32"
     dest_port: "53"

Within the locals block, Terraform will render the YAML file:

locals {
    acls = yamldecode(file("acls.yaml"))
}

The statements within the YAML config are then looped through within the entries attribute as shown in this resource:

resource "iosxe_access_list_extended" "test" {
    name = "bug-test-complex"
    entries = [
        for rule in local.acls["acls"]["YAML_ACL"]:
        {
            sequence                = index(local.acls["acls"]["YAML_ACL"], rule) + 1
            ace_rule_action         = rule.action
            ace_rule_protocol       = rule.protocol

            source_any              = rule.source == "any" ? true : null
            source_prefix           = rule.source == "any" ? null : element(split("/", rule.source), 0)
            source_prefix_mask      = rule.source == "any" ? null : local.prefix_masks[element(split("/", rule.source), 1)]
            source_port_equal       = lookup(rule, "source_port", null)

            destination_any         = rule.destination == "any" ? true : null 
            destination_prefix      = rule.destination == "any" ? null : element(split("/", rule.destination), 0)
            destination_prefix_mask = rule.destination == "any" ? null : local.prefix_masks[element(split("/", rule.destination), 1)]
            destination_port_equal  = lookup(rule, "dest_port", null)
        }
    ]
}

Note: Not shown for brevity, but there is a local.prefix_masks map that will convert a subnet CIDR notation to its associated wildcard mask.

Once this Terraform code is applied, the router will have an initial ACL of:

bug-test-rtr#show ip access-list | s bug-test-complex
Extended IP access list bug-test-complex
    1 permit udp any host 1.1.1.1 eq domain
    2 permit udp any host 1.1.1.2 eq domain
    3 permit udp any host 1.1.1.3 eq domain

If an ACL entry is added to the YAML configuration, this will work as expected. Here is the new YAML structure to show the rule change:

acls:
  YAML_ACL:
   - action: "permit"
     protocol: "udp"
     source: "any"
     destination: "1.1.1.1/32"
     dest_port: "53"

   - action: "permit"
     protocol: "udp"
     source: "any"
     destination: "1.1.1.2/32"
     dest_port: "53"

   - action: "permit"
     protocol: "udp"
     source: "any"
     destination: "8.8.8.8/32"
     dest_port: "53"

   - action: "permit"
     protocol: "udp"
     source: "any"
     destination: "1.1.1.3/32"
     dest_port: "53"

Note: A new entry was added as the third entry in the list.

This is accurately reflected on the router once the Terraform code is applied:

bug-test-rtr#show ip access-list | s bug-test-complex
Extended IP access list bug-test-complex
    1 permit udp any host 1.1.1.1 eq domain
    2 permit udp any host 1.1.1.2 eq domain
    3 permit udp any host 8.8.8.8 eq domain
    4 permit udp any host 1.1.1.3 eq domain

The bug starts once that entry is removed (or any other entry). If the new entry is removed from the YAML configuration and the Terraform code is applied, the router will now show the ACL as having only two entries:

bug-test-rtr#show ip access-list | s bug-test-complex
Extended IP access list bug-test-complex
    1 permit udp any host 1.1.1.1 eq domain
    2 permit udp any host 1.1.1.2 eq domain

The current fix for this is to completely destroy the ACL before recreating it, but the provider does not currently recreate the resource if a change to the entries attribute is detected, so this is done by removing the ACL from the configs before recreating it.

I am trying to better understand if this issue is related to the provider, the underlying YANG module, or other Terraform behavior that may indicate this is expected behavior.

Thank you for any assistance you can provide with this!

yamabay commented 6 months ago

I believe I see why this behavior is occurring, but am unsure how to prevent this.

Based on the complex test case described in the previous comment, when the 3rd ACL statement is removed, the list of entries is "shifted" and the old 4th entry is now the 3rd entry.

This is visible by viewing the expected actions Terraform will perform when that entry is removed:

  # iosxe_access_list_extended.test will be updated in-place
  ~ resource "iosxe_access_list_extended" "test" {
      ~ entries = [
          ~ {
              ~ destination_prefix      = "8.8.8.8" -> "1.1.1.3"
                # (6 unchanged attributes hidden)
            },
          - {
              - ace_rule_action         = "permit" -> null
              - ace_rule_protocol       = "udp" -> null
              - destination_port_equal  = "53" -> null
              - destination_prefix      = "1.1.1.3" -> null
              - destination_prefix_mask = "0.0.0.0" -> null
              - sequence                = 4 -> null
              - source_any              = true -> null
            },
            # (2 unchanged elements hidden)
        ]
        id      = "Cisco-IOS-XE-native:native/ip/access-list/Cisco-IOS-XE-acl:extended=bug-test-complex"
        name    = "bug-test-complex"
    }

The destination_prefix for the 3rd entry that was removed gets replaced by the 4th entry. The 4th entry now no longer exists in the list (it is the third item in the list).

The debug of the terraform apply shows the correct payload being sent to the router. The sequence numbers are accurate (the old 4th rule would have had its sequence changed from 4 to 3). However, Terraform is then sending a DELETE request for the old 4th sequence number:

[DEBUG] HTTP Request: PATCH, https://{IP-REMOVED}/restconf/data/Cisco-IOS-XE-native:native/ip/access-list/Cisco-IOS-XE-acl:extended, {{"Cisco-IOS-XE-acl:extended":{"name":"bug-test-complex","access-list-seq-rule":[{"sequence":"1","ace-rule":{"action":"permit","protocol":"udp","any":{},"dest-ipv4-address":"1.1.1.1","dest-mask":"0.0.0.0","dst-eq":"53"}},{"sequence":"2","ace-rule":{"action":"permit","protocol":"udp","any":{},"dest-ipv4-address":"1.1.1.2","dest-mask":"0.0.0.0","dst-eq":"53"}},{"sequence":"3","ace-rule":{"action":"permit","protocol":"udp","any":{},"dest-ipv4-address":"1.1.1.3","dest-mask":"0.0.0.0","dst-eq":"53"}}]}}}
[DEBUG] HTTP Response: 
[DEBUG] Exit from Do method
[DEBUG] HTTP Request: DELETE, https://{IP-REMOVED}/restconf/data/Cisco-IOS-XE-native:native/ip/access-list/Cisco-IOS-XE-acl:extended=bug-test-complex/access-list-seq-rule=4, {}
[DEBUG] HTTP Response: 
[DEBUG] Exit from Do method

It appears that this should possibly be a PUT instead of a PATCH to ensure any changes to the ACL configuration are reflected even when a statement's sequence number is changed and nothing else? From what this shows, it appears the desired sequence is removed, and then Terraform is also removing the new 3rd sequence because it was previously the 4th item in the list of entries which now no longer exists (and Terraform wants to destroy).

Sorry if this adds confusion or is not explained properly. I hope this helps provide better insight on the behavior being seen.

danischm commented 6 months ago

It looks like this is a bug in the RESTCONF implementation of XE as the last DELETE should not delete anything if an entry with a sequence number of "4" does not exist.

yamabay commented 6 months ago

Hey @danischm, Thanks for providing insight on this.

I have not tested this on the exact platform and IOS versions listed in the provider documentation. Are you aware if this bug is also present on those versions?

If this is a persistent bug across multiple IOS versions, what would be the way to handle requesting getting this fixed?

danischm commented 6 months ago

I haven't tested this specific use case, but it might be worth checking with the latest XE version to make sure this isn't something that has already been addressed in the meantime.

yamabay commented 5 months ago

I was able to test this on 17.9.4 (Cisco IOS XE Software, Version 17.09.04a) and the same symptoms occur. It is not the most current latest release, but fairly close. It is the highest I can go in my current environment, but it is at least up to the same version the provider was tested on.