Juniper / terraform-provider-apstra

Apstra Terraform Provider
Apache License 2.0
16 stars 2 forks source link

Handling dependency-related destroy operation issues #680

Open coterv opened 5 months ago

coterv commented 5 months ago

I've been conducting some tests with other resources and have encountered similar failure scenarios when attempting to remove a resource that is referenced by another resource. This situation echoes what we observed on May 16th with VNs and CTs. Here's a breakdown of the scenarios:

  1. Removing a virtual network that is applied to a connectivity template primitive
  2. Removing a routing policy that is applied to a routing zone or to a connectivity template primitive
  3. Removing a routing zone that is applied to a connectivity template IP Link primitive or assigned to a virtual network

In all these cases, our tests involve attempting to remove both the "bold" resource and its invocation within the "italic" resources with a single terraform apply

As an example, here’s the plan that fails for case 1:

Plan: 1 to add, 29 to change, 2 to destroy.

[…]

module.blueprints.apstra_datacenter_virtual_network.virtual_networks["DEMO.bar"]: Destroying... [id=LTtGudj0iFpM0fOifxY]
╷
│ Error: error deleting virtual network
│
│ {"api_response":null,"config_blueprint_version":0,"errors":{"virtual_network_id":"Deleting virtual networks with VN endpoints not
│ allowed"},"error_code":422} - http response '' at
│ 'http://apstra-df89c2ca-d2dc-4870-b92f-50d2a9453bc5.aws.apstra.com/api/blueprints/960b66c6-8472-470e-bfb3-b0e77c3d007a/virtual-networks/LTtGudj0iFpM0fOifxY?async=full&async=full'

If I understand correctly, all the cases revolve around the order of the “destroy” operations which are not influenced by the dependencies. This means that "italic" update events are not guaranteed to occur before "bold" removal events.

Before delving into the reworking plans to address case 1 based on splitting the current single CT creation resource into several per-root-primitive-type resources as you had in mind from our last conversation, it may be worthwhile to explore solutions for cases 2 and 3 first, as case 1 would likely follow similar principles. What do you think?

Thanks in advance.

chrismarget-j commented 5 months ago

Hi @coterv,

Thank you for opening this issue.

We believe that many of the "order of delete" problems you've experienced can be traced to the presence of a data source (representing a CT primitive) in the dependency path between a depended-on resource (like a virtual network) and the depending resources (a CT and application of that CT).

We hope to soon introduce a handful of new resources CT resources which won't require the primitive data sources, ensuring that destroy-time planning follows the implicit dependencies.

They'll be type-specific CTs based on the type of application points which can accept them. Each CT will have native support for the primitives supported by that type of application point.

In the simplest case there will be. a system CT type with a single primitive attribute: custom_static_routes (set of static route objects).

The interface CT type will be a bit more complicated with three primitives:

Each of those primitives will support their own attributes, plus sets of child primitives (IP link supports BGP sessions, BGP sessions support routing policies, etc...)

We intend to introduce 4 type-specific CT resources:

CTs for the other application point types (ip_link, protocol_endpoint, system, vn_endpoint) will not be supported because they exist only as child primitives to the first four types and do not need to stand on their own (medium confident about this)

Thoughts?

Thanks!

coterv commented 5 months ago

Hi @chrismarget-j ,

If I understand correctly, you expect that removing the primitive data sources from the dependency path between a depended-on resource (like a virtual network) and the depending resources (a CT and its application) will ensure that destroy-time planning follows the implicit dependencies, thereby resolving the first scenario:

1- Removing a virtual network that is applied to a connectivity template primitive.

What about scenarios 2 and 3? Have you also identified the presence of data sources in their dependency paths as the root cause preventing the destroy-time planning from following the implicit dependencies?

2- Removing a routing policy that is applied to a routing zone or a connectivity template primitive. 3- Removing a routing zone that is applied to a connectivity template IP Link primitive or assigned to a virtual network.

Thanks in advance!

chrismarget-j commented 5 months ago

Hi @coterv,

Unless I'm misunderstanding, it looks like all 3 scenarios involve a Connectivity Template, so the "data source in the middle" concern seems to apply equally to all three.

I don't anticipate any problems with "Removing a routing policy that is applied to a routing zone" - the usual Terraform implicit dependency stuff should suffice.

coterv commented 5 months ago

Hi @chrismarget-j ,

  1. Removing a virtual network applied to a connectivity template primitive.
  1. Removing a routing policy applied to a routing zone or a connectivity template primitive.

2.1. Removing a routing policy applied to a routing zone.

2.2. Removing a routing policy applied to a connectivity template primitive.

  1. Removing a routing zone applied to a connectivity template IP Link primitive or assigned to a virtual network.
    • This scenario includes two sub-scenarios:

3.1. Removing a routing zone applied to a connectivity template IP Link primitive.

3.2. Removing a routing zone assigned to a virtual network.

In brief, scenario 2.1 does not involve Connectivity Templates.