Azure / azure-dev

A developer CLI that reduces the time it takes for you to get started on Azure. The Azure Developer CLI (azd) provides a set of developer-friendly commands that map to key stages in your workflow - code, build, deploy, monitor, repeat.
https://aka.ms/azd
MIT License
412 stars 201 forks source link

Transient `azd deploy` error caused failed deployment #602

Closed KSchlobohm closed 2 years ago

KSchlobohm commented 2 years ago

Describe the bug We run a nightly deployment with azd to validate our bicep templates. As part of that validation we also deploy the code and plan to add some integration tests (not yet added).

The issue is that the azd deploy step failed with what appears to be a transient error from Azure.

Error: deploying service: deploying service api package: deploying service api: failed running az deployment source config-zip: exit code: 1, stdout: , stderr: WARNING: Getting scm site credentials for zip deployment
WARNING: Starting zip deployment. This operation can take a while to complete ...
WARNING: Deployment endpoint responded with status code 502
ERROR: An error occured during deployment. Status Code: 502, Details: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
...

To Reproduce Not able to reproduce the issue.

Expected behavior It's unclear how to handle this error in my pipeline, it looks like the error is coming from a call that azd is making to Azure. It would be great if azd had the ability to detect this 502 status and performed up to 3 retry attempts before failing the step.

Environment Information on your environment:

Additional context There may be other transient errors that would be appropriate to retry. While it is probably a larger scope of work it would also be nice if failed deployments could be retried as some deployments.

KSchlobohm commented 2 years ago

@karolz-ms this surfaced again and the response code was HTTP 202 so I looked at the message closer. It seems the error log was captured in kudu.

{
  "Message": "An error has occurred.",
  "ExceptionMessage": "No log found for 'latest'.",
  "ExceptionType": "System.IO.FileNotFoundException",
  "StackTrace": "   at Kudu.Core.Deployment.DeploymentManager.GetLogEntries(String id) in C:\\Kudu Files\\Private\\src\\master\\Kudu.Core\\Deployment\\DeploymentManager.cs:line 98\r\n   at Kudu.Services.Deployment.DeploymentController.GetLogEntry(String id) in C:\\Kudu Files\\Private\\src\\master\\Kudu.Services\\Deployment\\DeploymentController.cs:line 376"
}
karolz-ms commented 2 years ago

@KSchlobohm 202 is a success code... are you saying azd treated it as an error?

KSchlobohm commented 2 years ago

Sorry for the confusion, when I found the HTTP 202 code I thought it was another instance of the same error. The 202 status does not seem to be related to the kudu error above.

This is an instance of a different error with the same azd deploy operation

karolz-ms commented 2 years ago

Hmm. I have searched Kudu issues but haven't found anything that might be relevant.

@suwatch any ideas why Ken could intermittently get a ZipDeploy failure with "no log found for 'latest'" error?

rajeshkamal5050 commented 2 years ago

Issues which are tracked here,

  1. Bubbling up underlying deployment errors. Should have been fixed as part of #786
  2. Auto-retries in Azd for transient errors before giving up.
rajeshkamal5050 commented 2 years ago

@KSchlobohm's update offline for issue 1 above - From the logs, it looks like the azd deploy is failing because the Azure App Service we are deploying to is not ready or may not be healthy and is recovering. The recommended step is to re-run the azd deploy command.

weikanglim commented 2 years ago

Looking at the az code more, it seems like the error is indicating a problem with the POST request starting the zipdeploy. I don't think az had existing retries, and we should consider such for the new azd changes.

Petermarcu commented 2 years ago

If you use azure core from the azure go sdk, you can take advantage of the pipelines that have all kinds of patterns built in for retries, exponential backoff, etc.

Petermarcu commented 2 years ago

Also, if we can get an openapi spec for the service endpoint we're calling, we could generate a library that has these retries built in.

weikanglim commented 2 years ago

Since this issue was filed, azd has switched over to not depend on az, and we increased retries with zipdeploy submission with that should recover from temporary hiccups in app service #1051. Closing this.