Azure / azure-policy

Repository for Azure Resource Policy built-in definitions and samples
MIT License
1.49k stars 1.08k forks source link

Request: DINE policy for private endpoint -> private DNS zone linking with static webapps #1223

Open rybal06 opened 11 months ago

rybal06 commented 11 months ago

Details of the scenario you tried and the problem that is occurring

We are utilizing this Azure "best practice" architecture at scale with dozens of resource types without any issues.

https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/private-link-and-dns-integration-at-scale

This works great for every resource type we've encountered, except for static websites. To use Azure webapps as an example:

  1. Platform user deploys Azure private endpoint which connects to a WebApp; and leaves private dns zone configuration empty.
  2. Azure policy finds this configuration using DINE policy and adds the private DNS zone configuration to a central private DNS zone for the resource type.

Why? Platform users don't need to be concerned with DNS records, DNS servers, or DNS management. They'd don't require any permissions on DNS zones which are shared by many application workloads.

The problem with static sites

The problem is making this work with static websites. Upon creation, static webapps put a DNS Zone partition ID in the domain name, i.e. white-flower-048d2aa10.privatelink.3.azurestaticapps.net

This means that the domain names for static web apps vary, for example: 1.azurestaticapps.net 2.azurestaticapps.net 3.azurestaticapps.net 4.azurestaticapps.net 5.azurestaticapps.net 6.azurestaticapps.net ...

The way the other DINE policies work is by matching upon these properties, for example:

                "allOf": [
                  {
                    "field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*].privateLinkServiceId",
                    "contains": "Microsoft.Web/sites"
                  },
                  {
                    "field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*].groupIds[*]",
                    "equals": "sites"
                  }

ref: https://github.com/Azure/azure-policy/blob/master/built-in-policies/policyDefinitions/App%20Service/AppService_PrivateZoneGroup_DINE.json

Azure policy then configures the private DNS zone on the private endpoint, which creates the DNS record.

The problem? Regardless of the private dns zone partition (i.e. 1.azurestaticapps.net vs 2.azurestaticapps.net); static webapps all use the same private link service connection group ID so it is not possible to map the group ID to a private dns zone correctly.

Nowhere on the private endpoint resource is the private dns zone ID partition represented. The only way I have found to find that is to look at the static website linked to the private endpoint; parse the "Default Domain Name field" and extract the zone name from the uri, i.e. 1.azurestaticapps.net.

Suggested solution to the issue

Workaround

The workaround I am planning to test is asking platform users to populate a tag on private endpoints for Azure static webapps which contains the private dns zone name, i.e. StaticSitesDomainName: 9.azurestaticapps.net. We can have a custom policy then parse that value to determine which private dns zone to link the private endpoint into.

This isn't ideal as it is a one-off solution for a single Azure resource type and isn't our typical usage for tags.

Better Solution

It would be great to see the Azure policy team collaborate with the Azure static websites (and potentially DNS teams) to come up with a more permanent solution.

All Azure resource types we've encountered work perfectly with the recommended architecture https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/azure-best-practices/private-link-and-dns-integration-at-scale except for Static Sites which introduces challenges for our platform users.

A few ideas here:

rybal06 commented 11 months ago

The workaround of also adding the domain name as a custom tag, and leveraging it in the Azure policy does work as expected. It is not super intuitive to our platform users and is different than every other Azure resource type, but it is working.

matsest commented 10 months ago

Confirm my understanding that there is not any existing capability within Azure Policy to use a lookup function to find the private dns domain name for static sites.

@rybal06 It's not a capability of Azure Policy, but rather of ARM templates :) I have made a workaround for this. Since the resource Id's of the private link zones are deterministic we can use the resource group Id that holds the private DNS zones as an input, together with a template function that does a lookup on the private endpoint resource to get the hostname together with a fair amount of string manipulation. It's not pretty, but it removes the need for having this as a separate tag on the resource.

To simplify authoring I made the template part of the DINE policy as Bicep, and build it and paste it into the template part of the policy definition.

Expand for script samples ```bicep param privateDnsZoneResourceGroupId string param privateEndpointName string resource pe 'Microsoft.Network/privateEndpoints@2022-07-01' existing = { name: privateEndpointName } // .2.azurestaticapps.net or .azurestaticapps.net var fqdn = pe.properties.customDnsConfigs[0].fqdn // [ '', '2', 'azurestaticapps', 'net'] var parts = split(fqdn, '.') // '2' or 'azurestaticapps' var prefix = parts[1] // resolve name of private DNS Zone based on the prefix var privateDnsZone = !(prefix == 'azurestaticapps') ? 'privatelink.${prefix}.azurestaticapps.net' : 'privatelink.azurestaticapps.net' resource peDnsConfig 'Microsoft.Network/privateEndpoints/privateDnsZoneGroups@2022-07-01' = { name: 'deployedByPolicy' parent: pe properties: { privateDnsZoneConfigs: [ { name: 'staticSites-privateDnsZone' properties: { privateDnsZoneId: '${privateDnsZoneResourceGroupId}/providers/Microsoft.Network/privateDnsZones/${privateDnsZone}' } } ] } } output privateDnsZoneId string = '${privateDnsZoneResourceGroupId}/providers/Microsoft.Network/privateDnsZones/${privateDnsZone}' ``` The policyRule for this would then be: ```json { "if": { "allOf": [ { "field": "type", "equals": "Microsoft.Network/privateEndpoints" }, { "count": { "field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*]", "where": { "allOf": [ { "field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*].privateLinkServiceId", "contains": "Microsoft.Web/staticSites" }, { "field": "Microsoft.Network/privateEndpoints/privateLinkServiceConnections[*].groupIds[*]", "equals": "staticSites" } ] } }, "greaterOrEquals": 1 } ] }, "then": { "effect": "[parameters('effect')]", "details": { "type": "Microsoft.Network/privateEndpoints/privateDnsZoneGroups", "roleDefinitionIds": [ "/providers/Microsoft.Authorization/roleDefinitions/4d97b98b-1d4f-4787-a291-c67834d212e7" ], "existenceCondition": { "field": "name", "equals": "deployedByPolicy" }, "deployment": { "properties": { "mode": "Incremental", "template": { "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#", "contentVersion": "1.0.0.0", "parameters": { "privateDnsZoneResourceGroupId": { "type": "string" }, "privateEndpointName": { "type": "string" }, "location": { "type": "string" } }, "resources": [ { "type": "Microsoft.Network/privateEndpoints/privateDnsZoneGroups", "apiVersion": "2022-07-01", "location": "[parameters('location')]", "name": "[format('{0}/{1}', parameters('privateEndpointName'), 'deployedByPolicy')]", "properties": { "privateDnsZoneConfigs": [ { "name": "staticSites-privateDnsZone", "properties": { "privateDnsZoneId": "[format('{0}/providers/Microsoft.Network/privateDnsZones/{1}', parameters('privateDnsZoneResourceGroupId'), if(not(equals(split(reference(resourceId('Microsoft.Network/privateEndpoints', parameters('privateEndpointName')), '2022-07-01').customDnsConfigs[0].fqdn, '.')[1], 'azurestaticapps')), format('privatelink.{0}.azurestaticapps.net', split(reference(resourceId('Microsoft.Network/privateEndpoints', parameters('privateEndpointName')), '2022-07-01').customDnsConfigs[0].fqdn, '.')[1]), 'privatelink.azurestaticapps.net'))]" } } ] } } ] }, "parameters": { "privateDnsZoneResourceGroupId": { "value": "[parameters('privateDnsZoneResourceGroupId')]" }, "privateEndpointName": { "value": "[field('name')]" }, "location": { "value": "[field('location')]" } } } } } } } ```

It's not pretty, but it removes this special consideration from a user perspective and brings the complexity into the ones who manages policies instead.

Only issue I'm facing now is that I'm not sure how many possible partition numbers there is to create private DNS zones for..

rybal06 commented 10 months ago

@matsest Thank you so much for sharing! I likely won't get back to this in the next few weeks, but it seems like a much more elegant solution than my workaround. I had worked a support case with the Azure policy team previously and they didn't believe it was possible to do this type of lookup.

I also asked the collab support engineer from the Static App Team about the number of DNS zone partitions. They said that today Microsoft is using 0-3, but may add more in the future. I pre-populated 0-9 myself. This might be outdated as this was six months ago, but I hope it helps.

It would be great if the team maintaining this repo could merge your approach into a supported policy.

rybal06 commented 10 months ago

I did a some more research, what is interesting is that this policy alias doesn't seem to work, otherwise this would be so much simpler!

Microsoft.Network/privateEndpoints/customDnsConfigs[*].fqdn

The value is in the policy alias, but it doesn't seem to work. I've tried:

If the Alias worked, we wouldn't need to do the lookup using the ARM/bicep template function at all as the information we are grabbing is already present.

A quick internet search on Microsoft.Network/privateEndpoints/customDnsConfigs[*].fqdn doesn't turn up many results; however at least one other person has tried using it and failed: https://stackoverflow.com/questions/75889272/azure-custom-deployifnotexists-policy-logic

Does anyone have any insights as to why this Alias doesn't work in policy evaluation? Is this a bug in the alias?

MJ-Coder commented 10 months ago

We encounter the exact same bug as @rybal06 mentioned here We try to use the policy alias "Microsoft.Network/privateEndpoints/customDnsConfigs[*].fqdn" also for evaluating if an Azure App Service app is hosted by an App Service Environment or not, by looking for the domain suffix appserviceenvironment.net

Microsoft please fix this bug!

I did a some more research, what is interesting is that this policy alias doesn't seem to work, otherwise this would be so much simpler!

Microsoft.Network/privateEndpoints/customDnsConfigs[*].fqdn

The value is in the policy alias, but it doesn't seem to work. I've tried:

  • Matching upon it using both a wildcard with like *
  • Matching using equals and the exact value
  • Adding a 1 minute evaluation delay (theory was that maybe the value isn't populated immediately upon provisioning?)

If the Alias worked, we wouldn't need to do the lookup using the ARM/bicep template function at all as the information we are grabbing is already present.

A quick internet search on Microsoft.Network/privateEndpoints/customDnsConfigs[*].fqdn doesn't turn up many results; however at least one other person has tried using it and failed: https://stackoverflow.com/questions/75889272/azure-custom-deployifnotexists-policy-logic

Does anyone have any insights as to why this Alias doesn't work in policy evaluation? Is this a bug in the alias?

MJ-Coder commented 10 months ago

I've reported this as a bug in a service request, I hope they will pass it to their policy product team(s).

oc159 commented 4 months ago

We're also experiencing this issue with the policy alias Microsoft.Network/privateEndpoints/customDnsConfigs[*].fqdn I'm keen to limit the application of the policy if the IP of the PE does not fit within our vWan CIDR Block. Seems cumbersome but i'm looking to avoid issues down the road.

Hopefully MS has provided an update?