Azure / arm-template-whatif

A repository to track issues related to what-if noise suppression
MIT License
91 stars 14 forks source link

Resources in nested template not shown if reference is used in the parameter values #157

Open J0F3 opened 3 years ago

J0F3 commented 3 years ago

Describe the bug Probably related to #83 but not sure so I though it might worth it to post it here.

To Reproduce An ARM Template with a nested deployment. The parameter value of the nested deployment uses the reference function. When deploying the template with Whatif it shows only the vNet created in the parent template and nothing about the Subnet created in the nested template. On a re-deployment it gets even more weird. Whatif shows that the subnet gets deleted which is actually not what appends. So very confusing.

When the parameter value is replaced by a literal string the Whatif output is shown the changes correctly. So assume it is also related to the use of the reference function.

ARM Template:

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
  "contentVersion": "1.0.0.0",
  "functions": [],
  "resources": [
    {
      "type": "Microsoft.Resources/deployments",
      "apiVersion": "2019-10-01",
      "name": "vnetmodule",
      "properties": {
        "expressionEvaluationOptions": {
          "scope": "inner"
        },
        "mode": "Incremental",
        "parameters": {
          "location": {
            "value": "[resourceGroup().location]"
          }
        },
        "template": {
          "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
          "contentVersion": "1.0.0.0",
          "parameters": {
            "location": {
              "type": "string"
            }
          },
          "functions": [],
          "resources": [
            {
              "type": "Microsoft.Network/virtualNetworks",
              "apiVersion": "2020-05-01",
              "name": "vNet1",
              "location": "[parameters('location')]",
              "properties": {
                "addressSpace": {
                  "addressPrefixes": [
                    "10.0.0.0/16"
                  ]
                }
              }
            }
          ],
          "outputs": {
            "resourceName": {
              "type": "string",
              "value": "vNet1"
            },
            "resourceId": {
              "type": "string",
              "value": "[resourceId('Microsoft.Network/virtualNetworks', 'vNet1')]"
            }
          }
        }
      }
    },
    {
      "type": "Microsoft.Resources/deployments",
      "apiVersion": "2019-10-01",
      "name": "subnetModule",
      "properties": {
        "expressionEvaluationOptions": {
          "scope": "inner"
        },
        "mode": "Incremental",
        "parameters": {
          "vNetName": {
            "value": "[reference(extensionResourceId(resourceGroup().id, 'Microsoft.Resources/deployments', 'vnetmodule'), '2019-10-01').outputs.resourceName.value]"
          }
        },
        "template": {
          "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
          "contentVersion": "1.0.0.0",
          "parameters": {
            "vNetName": {
              "type": "string"
            }
          },
          "functions": [],
          "resources": [
            {
              "type": "Microsoft.Network/virtualNetworks/subnets",
              "apiVersion": "2020-05-01",
              "name": "[format('{0}/subnet1', parameters('vNetName'))]",
              "properties": {
                "addressPrefix": "10.0.0.0/24"
              }
            }
          ]
        }
      },
      "dependsOn": [
        "[extensionResourceId(resourceGroup().id, 'Microsoft.Resources/deployments', 'vnetmodule')]"
      ]
    }
  ]
}

Expected behavior I would except that the vnet and subnet is shown as 'create' on the first deployment and as 'no change' in subsequent deployment.

Screenshots First deployment (with reference in parameter value). Subnet is missing in the output: image

Re-deployment (with the same template). Subnet is shown as 'delete': image

First deployment when the value of the 'vNetName' parameter is stet to a literal string. Both the vnet and the subnet is shown correctly in the output: image

Re-deployment (with the same template) show both resource correctly (beside except some noise probably not related to this issue): image

Client PowerShell

Additional context I noticed quite some of such strange outputs of Whatif during playing with Biceps and Biceps modules as this results in templates with nested deployment where often also the reference function is used. So I thought it will be worth to place it here. But maybe your are already very aver of that? 😊

alex-frankel commented 3 years ago

Basically, what-if short circuits too aggressively when we see a reference() function, but we should still be able to calculate all of the resource IDs required for what-if without evaluating the reference() call. So this should be doable.

Thanks for reporting!

JustinGrote commented 3 years ago

@alex-frankel I ran into this one when using modules in bicep. Anything in a module (which translates to a nested deployment) doesn't show up in my whatif result.

JFolberth commented 3 years ago

One comment to expand on running with cli 2.2 and bicep the plan actually ignores the child resources rather than delete like the original issue described:

image

alex-frankel commented 3 years ago

We discussed this today and the root cause of the issue is essentially that as soon as there is one parameter that uses a runtime function in a nested template, we have no good way of ignoring it/throwing it away and evaluating the rest of the template. The reason it is difficult for us because the way we evaluate expressions in an ARM Template is...fragile :) We have some work ongoing to refactor this code to make this easier for us, but no good ETA I can provide.

@J0F3 - for your case in particular, you can probably refactor the code to avoid the reference() call entirely. You can set the vnet name to be a variable in the parent template, then pass that variable to the nested template as a parameter.

dani3lheidemann commented 3 years ago

Hi @alex-frankel, I recently opened #3237 and can confirm that it's the same issue.

I also can confirm that when I'm leaving out any references, that validation shows up all resources I actually want to deploy. And of course, I'm able to set additional variables to "convert" my input parameters, but I also have to waive using ARM/Bicep module outputs as input parameters for other modules, because using outputs is also defined as a reference in ARM.

May you have an update when this is gonna be fixed?

Thank you very much!

alex-frankel commented 3 years ago

We are working on this now with the hopes of having it resolved by the end of the calendar year. This is a bit of a gnarly, low level issue so please be patient :)

alex-frankel commented 3 years ago

FYI - the fix for this has been checked in. Should be deployed sometime in December.

samhughes91 commented 3 years ago

Very happy to hear this news. Thanks for the update Alex

J0F3 commented 3 years ago

Awesome news! Thank you @alex-frankel!

nbwdk commented 2 years ago

@alex-frankel - any updates on this release? I've updated the az.resources module to 5.2.0, but at first glance it is not fixed.

alex-frankel commented 2 years ago

The feature is now released behind a feature flag. We are working to remove the feature flag, which should be done by end of Jan. Sorry about the extra delay on this.

JFolberth commented 2 years ago

Would this cover if doing a single bicep deployment at the subscription level across multiple resource groups?

Have a bicep deployment deploying an Azure Function in it's own resource group and then creating the API endpoint in APIM which is housed in a separate resource group.

Running what-if it appears to not even recognize (not even showing APIM as ignore) the resources being updated in APIM.

Apologies ahead if this was mentioned in one of the multiple issues tied to this....

J0F3 commented 2 years ago

@JFolberth yes, I think you’re experiencing exactly what this issue is about. Modules in Biceps, wich you probably are using, get translated to nested templates and according what you describe it may be very good possible that the parameters of these nested templates are using a runtime function which leads then to this unexpected behavior.

@alex-frankel any news about the removal of the feature flag? 😊

alex-frankel commented 2 years ago

We gave a small update in the most recent community call. Latest ETA is before the end of Feb. We noticed a couple of bugs with what we had previously released, so we need to get the new updates pushed out.

jackbatzner commented 2 years ago

Thanks for all the hard work and great job communicating the timeline @alex-frankel ! I'm looking forward to using this on my teams.

fschmied commented 2 years ago

Is there a workaround for seeing whether my deployment specification matches the existing resources until this is released? I'm currently writing Bicep files for existing resources and need to make sure they exactly match what was created by hand.

alex-frankel commented 2 years ago

Is there a workaround for seeing whether my deployment specification matches the existing resources until this is released?

What-if, because of the noise issue captured here, is not 100% reliable for determining if the resource being deployed exactly matches an existing resource. Alternatively, we would recommend creating a new instance of the resource you are looking to match and migrate any relevant state over to it. That way, you can be sure your "as-code" definition is working as expected before migration without having to risk taking down/modifying an existing resource with an undesirable behavior.

fschmied commented 2 years ago

Even though I wrote "exactly" above, a somewhat reliable solution would already be very helpful in my endeavors (recreating and migrating would be a huge effort). At the moment, as the what-if feature doesn't work at all for my Bicep files, I'm thinking of deploying to another subscription, then generating ARM templates for both the old and the new resources and manually comparing those.

And I'm very much looking forward to the update 🙂 .

shenglol commented 2 years ago

Thanks for waiting, everyone. The feature has been fully deployed to all regions now!

alex-frankel commented 2 years ago

Woohoo! Going to close this, but please report back if anyone runs into any issues.

shenglol commented 2 years ago

Btw, I'd like to add some notes about the limitations:

JFolberth commented 2 years ago

Willing to chalk this up to my lack of understanding of the issue; however, my bicep files are still not detecting any changes post upgrade of both bicep and az cli.

The structure I have is the main.bicep file is set to deploy at the subscription level. Within that file it will create a resource group via module and then reference that resource group for use in additional modules called from the main.bicep file. The output still reflects all the modules being ignored.

rellis-of-rhindleton commented 2 years ago

Same here — nothing from any module in the what-if output. It’s like they don’t exist.

I’m not sure what @shenglol ’s second note means in practical terms. My setup references existing resources, hopefully that isn’t an obstacle.

shenglol commented 2 years ago

@JFolberth Could you share the contents of the Bicep files? Based on the description it looks like it should be supported.

@rellis-of-rhindleton Do you mind sharing the templates you used as well? Referencing existing resources is not supported by What-If, but it's hard to tell if that's the root cause without analyzing the templates.

rellis-of-rhindleton commented 2 years ago

app-test.zip

Stripped these down to something small but still reproducing the problem. In this arrangement I get what-if information for the resource group, application insights, and app service plan, but not the app service or the key vault access policy.

As an experiment I combined all the app service related items (basically anything that was going into the new resource group) into a single module. Couldn't do that with the key vault though, since it is a preexisting resource in a different group. With that arrangement, the app service started showing up, but the key vault access policy still didn't.

shenglol commented 2 years ago

@rellis-of-rhindleton Thanks for sharing the templates! The reason why What-If doesn't work in this case is that some module outputs reference read-only run-time properties, such as appService.identity.principalId and appi.properties.InstrumentationKey. These properties are generated by the resource providers upon construction of the corresponding resources, so there's no way to predict their values without actually deploying the template. Unfortunately, this is another limitation of What-If.

JFolberth commented 2 years ago

@shenglol Thanks, that answers mine as I have the same types of issues. App Service outputting MSI for Key Vault RBAC or App Insights outputting for App Service.

This seems like a major limitation for modulation. I get it's not known at initial creation; however, is there a way to at least detect if the value is changed?

fschmied commented 2 years ago

I wonder whether what-if could somehow just interpret these non-resolvable values as "unknown" (and show them as possibly changed) rather than completely ignoring the modules?

miqm commented 2 years ago

Terraform plan also sometimes can result with lots of "known after apply" entries even in resource names. The better part is that in TF it uses symbolic names to show that particular resource will be created, while in ARM there are no symbolic names to use. In bicep this could be possible but I assume not in a short term.

Anyway, I think it'd be better to show that some resource is going to be deployed even if we can't tell it's name (but we can tell the type and module it's in) rather than ignoring the entire module.

rellis-of-rhindleton commented 2 years ago

@shenglol - I don't want to do a lot of comparing Bicep to Terraform here, but this is an area where Terraform's approach seems much more useful, at least from the user's point of view. Just throwing this out there to make sure it has been said.

TF's "plan" step reports any resource that will be touched/created. For properties where you won't know the value until after actual deployment is completed, it simply indicates that. That's usually a lot of properties with "not known until complete" -- but it is still helpful. The fact that a resource is going to be created or updated is relevant, whether we know all of the details in advance or not. Just reporting what is known and can be predicted is highly valuable.

As the person running the deployment, I want to know what will get touched. It tells me if I've accidentally left something in or out of the plan, whether I made a mistake in a name, etc. As things are now, I feel like I'm flying blind. It makes it hard for me to start relying on Bicep.

I'm mindful of the difficulty that must be involved here, and I'm impressed at Bicep so far. I just want to make sure that the team knows how useful and important this kind of thing is to the user.

miqm commented 2 years ago

@rellis-of-rhindleton - just to add a small note: TF plan shows what he thinks will happen. if a resource exists in Azure but does not exist in your state - TF will still show as "to be created" and eventually fail.

as ARM/bicep deployment does not have a state to maintain what-if compares with what actually is in Azure, hence the difficulty.

rellis-of-rhindleton commented 2 years ago

@miqm Duly noted, and again I don't want to imply the two systems work similarly, or minimize what must be an extremely complicated process.

I'm not looking for an exactly correct picture of the final result. I just want to see what ARM thinks is going to happen. Best guess is good enough.

alex-frankel commented 2 years ago

I'm going to re-open this. We agree with the general sentiment that a non-deterministic "computed" property should not result in the set of resources in the module being "skipped" by what-if.

My understanding is that this is an even trickier problem than what we just implemented, but we will have to figure out how to get it done.

Apologies that this initial improvement did not have the level of coverage that folks were expecting. We will continue working on this, but can't provide any ETAs yet. We will discuss next week and I will keep the thread updated.

fschmied commented 2 years ago

@alex-frankel Thank you, that's a really cool reaction to community feedback! ❤️

rellis-of-rhindleton commented 2 years ago

Good to hear @alex-frankel

Maybe it's a long term goal, maybe it turns out it's just not an option and then so be it. But I do think it's worth seriously exploring.

BTW I love the language service in VS Code. Not relevant here but I felt like I should say something positive 🥇

shenglol commented 2 years ago

Update: we found a bug due to the code changes in backend that we failed to catch during the preview of this feature. It is a very rare edge case, but because other functionalities like /validate and /deploy may also get affected, we are temporarily disabling reference function evaluation in What-If. We should be able to turn it on again in two weeks. Sorry for the inconvenience.

b4icurmt commented 2 years ago

Is there any update on this? It has been two weeks, and it seems it is still an issue.

shenglol commented 2 years ago

Another fix has been checked in to re-enable the feature two weeks ago, but it's still awaiting deployment (unfortunately the What-If feature is bundled with other ARM core features, so we don't have control over the deployment plan). Hopefully it can get rolled out by end of this month.

justinmchase commented 2 years ago

@shenglol Will this require a cli upgrade, or is this fix purely in azure backend?

shenglol commented 2 years ago

This is purely fixed in the backend.

mcalnd70 commented 2 years ago

If it helps I had a working Bicep example in an issue here that was closed in favour of this one as a duplicate

https://github.com/Azure/arm-template-whatif/issues/258

b4icurmt commented 2 years ago

When can we expect the fix to be deployed?

shenglol commented 2 years ago

The fix has been fully deployed earlier today.

joelnotified commented 2 years ago

I'm going to re-open this. We agree with the general sentiment that a non-deterministic "computed" property should not result in the set of resources in the module being "skipped" by what-if.

My understanding is that this is an even trickier problem than what we just implemented, but we will have to figure out how to get it done.

Apologies that this initial improvement did not have the level of coverage that folks were expecting. We will continue working on this, but can't provide any ETAs yet. We will discuss next week and I will keep the thread updated.

@shenglol Nice! But I guess it's not a fix for this ☝️ right? Ping @alex-frankel

shenglol commented 2 years ago

That's right. the fix only works for deterministic property references that can be resolved at deployment time.

JFolberth commented 2 years ago

Curious if this is still being investigated?

alex-frankel commented 2 years ago

No meaningful updates. It's on our list of things to take care of, but we will likely not have an update in the next 3-6 months.

JustinGrote commented 2 years ago

Should there at least be a warning like "MODULENAME: This module contains deterministic properties that cannot be resolved at deploy time and there will be no what-if output for the module: . Consider adjusting all parameters to be deterministic if possible: "?

Without a proper way to preview all changes, Bicep is relegated to net-new deployments and is not mature enough to be a true IaC tool. If I cannot modify a VM that uses a VM module and preview that the change I made will occur as I expected, then every change is a blind hope that it will perform correctly, which is unacceptable in production environments when Terraform/Pulumi/etc. can preview all changes down to the API call.

JustinGrote commented 2 years ago

Another gotcha while trying to refactor away from this problem: If one of your parameters is an object, referencing properties from that object in the module parameters for whatever reason seems to be considered non-deterministic. e.g.

param tags object = {}
module vm 'vmtemplate.bicep' = {
  ...
  tags = {
    mycustomtag = tags.mycustomtag
  }
}
// No whatif output

if you make tags a var instead and make the individual tags string params that pass into the var, then its fine, but this prevents you from supplying arbitrary tags to the module and still be able to get whatif output. Since the object is known at deploy time, seems like this shouldn't be an exclusion.

param tag1 string
var tags = {
  tag1 = tag1
}

module vm 'vmtemplate.bicep' = {
  ...
  tags = tags
}
// Whatif Output OK
maskati commented 2 years ago

Is the core challenge here that whatif is a feature of ARM, and is calculated by ARM without involvement of resource providers (other than possibly performing a GET)? Would it make sense to implement and propagate the whatif to resource providers? Because in the end only the resource provider knows what the actual result of a particular deployment is.