Azure / azure-dev

A developer CLI that reduces the time it takes for you to get started on Azure. The Azure Developer CLI (azd) provides a set of developer-friendly commands that map to key stages in your workflow - code, build, deploy, monitor, repeat.
https://aka.ms/azd
MIT License
386 stars 178 forks source link

[Issue] azd up should consider whether the resource group exists before opting not to create it #3771

Open pamelafox opened 2 months ago

pamelafox commented 2 months ago

Output from azd version

1fastapi-azure-function-apim % azd version azd version 1.8.0 (commit 8246323c2472148288be4b3cbc3c424bd046b985)

Describe the bug I ran azd up on a resource group that I probably deleted at some point. I got this error:

fastapi-azure-function-apim % azd env select fastapi-azf-apim
fastapi-azure-function-apim % azd up

Packaging services (azd package)

  (✓) Done: Packaging service api
  - Package Output: /var/folders/nq/2sq7jzp17tl97sgxzbptr2d80000gn/T/fastapi-azure-function-apim-api-azddeploy-1713804018.zip

SUCCESS: Your application was packaged for Azure in 5 seconds.

Provisioning Azure resources (azd provision)
Provisioning Azure resources can take some time.

Subscription: ca-pamelafox-demo-test (32ea8a26-5b40-4838-b6cb-be5c89a57c16)
Location: East US

  (-) Skipped: Didn't find new changes.

SUCCESS: There are no changes to provision for your application.

Deploying services (azd deploy)

  (x) Failed: Deploying service api

ERROR: getting target resource: getting default resource groups for environment: fastapi-azf-apim: resource not found: 0 resource groups with prefix or suffix with value: 'fastapi-azf-apim'

ERROR: error executing step command 'deploy --all': getting target resource: getting default resource groups for environment: fastapi-azf-apim: resource not found: 0 resource groups with prefix or suffix with value: 'fastapi-azf-apim'

To Reproduce

You probably need to run azd up, then delete the RG in Portal, then run again (with no Bicep change)

Expected behavior

It should have provisioned.

Environment Information on your environment: Mac OSX M1 Terminal

Additional context

To workaround, I will call azd provision --no-state

ellismg commented 2 months ago

I wonder what sort of semantics we want to apply here. The deployment state caching feature was built around the idea that folks wouldn't go mucking with resources managed by their azd deployment in the portal, so we could trust the deployment history as a way to detect if we can skip things or not.

We certainly could augment the heuristic to say something like "it is safe to skip the ARM deployment if the most recent deployment is successful, the template hash of it matches our current template hash AND the resource groups impacted by the deployment still exist." That would address this issue. But imagine now that you delete some individual resource in that resource group. Do we expect that azd provision should detect this and force a full deployment? What about changing a property on a resource using the portal? Is the expectation that that would be detected, and provision would be forced to run? In the limit we would be rebuilding the what the ARM deployment engine does and that doesn't seem right.

We added this check because we found in practice doing it was faster than submitting the deployment template and waiting for that no-op deployment to complete. This worked when you promised not to go behind azd's back and muck around with your infrastructure. Deleting resources out of band is breaking that promise.

Maybe we should instead focus on improving the error behavior here, to advise the customer to run azd provision --no-state in this case?

Or maybe there's a middle ground where we just ensure all the resources touched by the deployment still exist (or maybe just the RGs) but not look at individual property values when deciding if we can skip the deployment or not. I am worried in practice this is doing to make the end to end much slower and then maybe we arrive at a place where just submitting the deployment (and then working with the ARM Team to figure out how we can improve the speed of these no-op deployments) is the right call?

pamelafox commented 2 months ago

As a quick measure, perhaps you could mention azd provision --no-state in the output when it skips deployment. I think I had to dig around for it.

I did end up using --no-state frequently today as I felt like it wasn't deploying my actually changed bicep (between azd env select calls), but I may have just been seeing things.