Azure / terraform-azurerm-caf-enterprise-scale

Azure landing zones Terraform module
https://aka.ms/alz/tf
MIT License
791 stars 514 forks source link

can't configure bgp for VPN #230

Closed davidkarlsen closed 2 years ago

davidkarlsen commented 2 years ago

Community Note

Versions

v1.1.0

terraform: Terraform v1.0.11

azure provider: version = "~> 2.87"

module: erraform-azurerm-caf-enterprise-scale

Description

we're attempting to setup the connectivity with the following config:


module "caf-enterprise-scale" {
  source  = "Azure/caf-enterprise-scale/azurerm"
  version = "1.1.0"
  providers = {
    azurerm              = azurerm
    azurerm.connectivity = azurerm
    azurerm.management   = azurerm
  }

  root_parent_id                = data.azurerm_client_config.core.tenant_id
  subscription_id_management    = "ourMgmtSub"
  subscription_id_connectivity  = "ourCommsSub"
  deploy_core_landing_zones     = false
  deploy_connectivity_resources = true
  default_location              = local.default_location
  configure_connectivity_resources = {

    advanced = {
      custom_settings_by_resource_type = {
        azurerm_virtual_network_gateway = {
          enable_bgp = true <-- this is not picked up?
        }
      }
    }

    settings = {
      hub_networks = [
        {
          enabled = true
          config = {
            address_space                = ["fullCidr", ]
            location                     = local.default_location
            link_to_ddos_protection_plan = false
            dns_servers                  = []
            bgp_community                = ""
            subnets                      = []
            virtual_network_gateway = {
              enabled = true
              config = {
                address_prefix           = "gwCidr"
                gateway_sku_expressroute = ""
                gateway_sku_vpn          = "Basic"
              }
            }
            azure_firewall = {
              enabled = false
              config = {
                address_prefix   = "fwCidr"
                enable_dns_proxy = true
                availability_zones = {
                  zone_1 = true
                  zone_2 = true
                  zone_3 = true
                }
              }
            }
            spoke_virtual_network_resource_ids      = []
            enable_outbound_virtual_network_peering = true
          }
        }
      ]
      vwan_hub_networks = []
      ddos_protection_plan = {
        enabled = false
        config = {
          location = "northeurope"
        }
      }
      dns = {
        enabled = true
        config = {
          location = null
          enable_private_link_by_service = {
            azure_automation_webhook             = false
            azure_automation_dscandhybridworker  = false
            azure_sql_database_sqlserver         = false
            azure_synapse_analytics_sqlserver    = false
            azure_synapse_analytics_sql          = false
            storage_account_blob                 = false
            storage_account_table                = false
            storage_account_queue                = false
            storage_account_file                 = false
            storage_account_web                  = false
            azure_data_lake_file_system_gen2     = false
            azure_cosmos_db_sql                  = false
            azure_cosmos_db_mongodb              = false
            azure_cosmos_db_cassandra            = false
            azure_cosmos_db_gremlin              = false
            azure_cosmos_db_table                = false
            azure_database_for_postgresql_server = false
            azure_database_for_mysql_server      = false
            azure_database_for_mariadb_server    = false
            azure_key_vault                      = false
            azure_kubernetes_service_management  = false
            azure_search_service                 = false
            azure_container_registry             = false
            azure_app_configuration_stores       = false
            azure_backup                         = false
            azure_site_recovery                  = false
            azure_event_hubs_namespace           = false
            azure_service_bus_namespace          = false
            azure_iot_hub                        = false
            azure_relay_namespace                = false
            azure_event_grid_topic               = false
            azure_event_grid_domain              = false
            azure_web_apps_sites                 = false
            azure_machine_learning_workspace     = false
            signalr                              = false
            azure_monitor                        = false
            cognitive_services_account           = false
            azure_file_sync                      = false
            azure_data_factory                   = false
            azure_data_factory_portal            = false
            azure_cache_for_redis                = false
          }
          private_link_locations                                 = []
          public_dns_zones                                       = []
          private_dns_zones                                      = []
          enable_private_dns_zone_virtual_network_link_on_hubs   = true
          enable_private_dns_zone_virtual_network_link_on_spokes = true
        }
      }
    }

    location = local.default_location
    tags = {
      terraform = "true"
    }
  }
}

Describe the bug

We can't see that advanced.custom_settings_by_resource_type.azurerm_virtual_network_gateway.enable_bgp is picked up for the VPN gateway.

Steps to Reproduce

  1. use config as described
  2. observe no bgp_enabled flag on vpn gw

Screenshots

N/A Additional context

N/A

krowlandson commented 2 years ago

Thank you for raising this question @davidkarlsen

It looks like you were pretty close with the input, but not quite there. If you look at the line of code where this value is set, you will see that you also need to provide additional parts to the object to target the correct scope, as below:

image

So for your specific example, please try with the following:

    advanced = {
      custom_settings_by_resource_type = {
        azurerm_virtual_network_gateway = {
          connectivity = {
            vpngw= {
              (local.default_location) = { # <-- not the parenthesis around this local to ensure it's correctly interpreted as a key
                enable_bgp = true # <-- this should now be picked up :-)
              }
            }
          }
        }
      }
    }

Hope this helps, but please let us know how you get on!

davidkarlsen commented 2 years ago

Awesome - that solved it - thanks a lot! It was a bit convoluted to backtrack - maybe a few examples of a bit more real-world/complex setups could be worthwhile in order to ease adoption?

I'm now looking into azurerm_local_network_gateway + azurerm_virtual_network_gateway_connection and it seems these needs to be managed outside of the module? This gives some friction as the id's are used in a loop inside of the caf module, so you get the infamous "known only after apply" issue, and need to target certain modules before running the caf one. I understand that one does not want to shoe-horn all config into the caf module and keep it lean, as there are many ways to setup comms, but making it easy to plug together would be great.

Could you also shed some light on reading outputs, specifically I need to read the virtual_network_gateway_id and resource-group name for the comms RG.

For now I could just hack the latter with: comms_resource_group_name = "es-connectivity-${local.default_location}"

I can read the virtual_network_gateway_id through a datasource but it feels a bit hackish as it leaks abstractions from the caf module into other resources, where one has to rely on the [current] naming-conventions in order to read them.

krowlandson commented 2 years ago

@davidkarlsen we understand your frustration with the resource IDs, but we had to find a way which would allow us to handle these in a consistent and meaningful way. As I'm sure you've discovered through your work with the advanced settings, each resource has a different set of criteria to identify it as "unique". This made it challenging to build a data model which would be meaningful to anyone, and programmatically friendly.

We might have been able to make this a bit cleaner by using a more traditional approach of module nesting for resources, but this would make it much harder to enable some of the key features of the module. For example, enabling deployments using multiple pipelines (one for core resources, another for management resources, etc.) without breaking the integration which ensures resources created by the module are compliant with the corresponding policies. This was made harder by the fact that we need to ensure literally anything and everything can be customised for those edge cases which customers are experts at identifying 😄

For now (hacky or not), our recommendation is to use the data resource approach you mention, as this is actually a HashiCorp recommended approach.

Moreover, we recommend that you start to think about breaking up your deployment across pipelines in alignment with your operational processes. This allows you to simplify scenarios such as change management, as the scope of what's being changed is more specific, and therefore less impactful. e.g. firewall rule changes shouldn't risk impacting existence of the firewall itself, or the route tables which ensure traffic is directly to the firewall.

I'm going to close this issue as we've answered the original question but if you have any further questions please feel free to post them here or on a new issue.