Azure / deployment-stacks

Contains Deployment Stacks CLI scripts and releases
MIT License
89 stars 7 forks source link

Deployment stacks started failing this week without any error details - "Operation returned an invalid status 'OK' " #161

Closed hallgeir-osterbo-visma closed 5 months ago

hallgeir-osterbo-visma commented 5 months ago

Describe the bug A clear and concise description of what the bug is.

This week, a couple of our templates started failing to deploy when deploying through deployment stacks. It seems fine when deploying without deployment stacks.

Not much to go on - in the portal, the deployment itself is green, but the deployment stack has status "Failed", without any more info. Not even the regular "show raw JSON" for the error, or a correlation ID. The output of the az CLI is simply just: "Operation returned an invalid status 'OK'"

I ran az CLI with --debug to get some more info, and there it's obvious that the deployment stack failed, but not why.

Here is the last few requests before it failed (I redacted the subscription ID and the "c" query parameter which I was unsure if it was sensitive or not -- looked like a certificate). If you need the subscription ID please provide a place where I can send it, and I'll be happy to share it (just don't want it on a public forum).

14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Request URL: 'https://management.azure.com/subscriptions/SUBSCRIPTION_ID/providers/Microsoft.Resources/locations/northeurope/deploymentStackOperationStatus/f242d9d3-2507-4932-8951-823024ebe695?api-version=2022-08-01-preview&t=638489531791140783&c=REDACTED&s=VAm6H2CIbMMnfYSetiEYh1_UniZRB5Wubz5LyY6BzHwnrGOh-7uMUeXNhveQSalnywt_XJ1Kw3BJ3c-fZWtTht0kOO6ukwkSnpi3_CRSBr1dkZQ3WVzO1KfZeNKTsPj12j8Vyy4kwcMtigr8xmBfkIwgcLCnldAGpCEZOste1cPCMgZpasaiZbvlNLIedydj_en85HsE80SpFutIja5kiaOtyBGk098qjBvDTEbAvuWuc2yqvTIjp2E15Dxalz2FYuiQP0rxptJKvAxFW52Yg7vhZfZN9CrKMfsr2zHUCl94fi0mbfo_sB9MIfXtI5SFtAlJeZxVGNdKzFJVUzPs-Q&h=0KS-bHSK_Z5X2mufjmxdpveUJDWaG6c3JC4e1LoG3NI'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Request method: 'GET'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Request headers:
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'x-ms-client-request-id': 'bc9a0e96-fcb4-11ee-9a90-0050568f37d0'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'CommandName': 'stack group create'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'ParameterSetName': '--resource-group --name --template-file --parameters --parameters --yes --deny-settings-mode --action-on-unmanage --debug'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'User-Agent': 'AZURECLI/2.59.0 (MSI) azsdk-python-core/1.28.0 Python/3.11.8 (Windows-10-10.0.17763-SP0)'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Authorization': '*****'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Request body:
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: This request has no body
14:23:44   Error    |       DEBUG: urllib3.connectionpool: https://management.azure.com:443 "GET /subscriptions/SUBSCRIPTION_ID/providers/Microsoft.Resources/locations/northeurope/deploymentStackOperationStatus/f242d9d3-2507-4932-8951-823024ebe695?api-version=2022-08-01-preview&t=638489531791140783&c=REDACTED&s=VAm6H2CIbMMnfYSetiEYh1_UniZRB5Wubz5LyY6BzHwnrGOh-7uMUeXNhveQSalnywt_XJ1Kw3BJ3c-fZWtTht0kOO6ukwkSnpi3_CRSBr1dkZQ3WVzO1KfZeNKTsPj12j8Vyy4kwcMtigr8xmBfkIwgcLCnldAGpCEZOste1cPCMgZpasaiZbvlNLIedydj_en85HsE80SpFutIja5kiaOtyBGk098qjBvDTEbAvuWuc2yqvTIjp2E15Dxalz2FYuiQP0rxptJKvAxFW52Yg7vhZfZN9CrKMfsr2zHUCl94fi0mbfo_sB9MIfXtI5SFtAlJeZxVGNdKzFJVUzPs-Q&h=0KS-bHSK_Z5X2mufjmxdpveUJDWaG6c3JC4e1LoG3NI HTTP/1.1" 200 264
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Response status: 200
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Response headers:
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Cache-Control': 'no-cache'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Pragma': 'no-cache'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Content-Length': '264'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Content-Type': 'application/json; charset=utf-8'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Expires': '-1'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'x-ms-request-id': '562afc99-b696-41a1-afa0-026c0d624aed'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Server': 'Microsoft-HTTPAPI/2.0'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'x-ms-ratelimit-remaining-subscription-reads': '11990'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'x-ms-correlation-request-id': '703f00f4-b6ac-4f40-858f-f397d09ac73b'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'x-ms-routing-request-id': 'NORWAYEAST:20240417T122340Z:703f00f4-b6ac-4f40-858f-f397d09ac73b'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'X-Content-Type-Options': 'nosniff'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Date': 'Wed, 17 Apr 2024 12:23:39 GMT'
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Response content:
14:23:44   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: {
14:23:44   Error    |       "id": "/subscriptions/SUBSCRIPTION_ID/providers/Microsoft.Resources/locations/northeurope/deploymentStackOperationStatus/f242d9d3-2507-4932-8951-823024ebe695",
14:23:44   Error    |       "name": "f242d9d3-2507-4932-8951-823024ebe695",
14:23:44   Error    |       "status": "deploying"
14:23:44   Error    |       }
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Request URL: 'https://management.azure.com/subscriptions/SUBSCRIPTION_ID/providers/Microsoft.Resources/locations/northeurope/deploymentStackOperationStatus/f242d9d3-2507-4932-8951-823024ebe695?api-version=2022-08-01-preview&t=638489531791140783&c=REDACTED&s=VAm6H2CIbMMnfYSetiEYh1_UniZRB5Wubz5LyY6BzHwnrGOh-7uMUeXNhveQSalnywt_XJ1Kw3BJ3c-fZWtTht0kOO6ukwkSnpi3_CRSBr1dkZQ3WVzO1KfZeNKTsPj12j8Vyy4kwcMtigr8xmBfkIwgcLCnldAGpCEZOste1cPCMgZpasaiZbvlNLIedydj_en85HsE80SpFutIja5kiaOtyBGk098qjBvDTEbAvuWuc2yqvTIjp2E15Dxalz2FYuiQP0rxptJKvAxFW52Yg7vhZfZN9CrKMfsr2zHUCl94fi0mbfo_sB9MIfXtI5SFtAlJeZxVGNdKzFJVUzPs-Q&h=0KS-bHSK_Z5X2mufjmxdpveUJDWaG6c3JC4e1LoG3NI'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Request method: 'GET'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Request headers:
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'x-ms-client-request-id': 'bc9a0e96-fcb4-11ee-9a90-0050568f37d0'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'CommandName': 'stack group create'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'ParameterSetName': '--resource-group --name --template-file --parameters --parameters --yes --deny-settings-mode --action-on-unmanage --debug'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'User-Agent': 'AZURECLI/2.59.0 (MSI) azsdk-python-core/1.28.0 Python/3.11.8 (Windows-10-10.0.17763-SP0)'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Authorization': '*****'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Request body:
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: This request has no body
14:24:14   Error    |       DEBUG: urllib3.connectionpool: https://management.azure.com:443 "GET /subscriptions/SUBSCRIPTION_ID/providers/Microsoft.Resources/locations/northeurope/deploymentStackOperationStatus/f242d9d3-2507-4932-8951-823024ebe695?api-version=2022-08-01-preview&t=638489531791140783&c=REDACTED&s=VAm6H2CIbMMnfYSetiEYh1_UniZRB5Wubz5LyY6BzHwnrGOh-7uMUeXNhveQSalnywt_XJ1Kw3BJ3c-fZWtTht0kOO6ukwkSnpi3_CRSBr1dkZQ3WVzO1KfZeNKTsPj12j8Vyy4kwcMtigr8xmBfkIwgcLCnldAGpCEZOste1cPCMgZpasaiZbvlNLIedydj_en85HsE80SpFutIja5kiaOtyBGk098qjBvDTEbAvuWuc2yqvTIjp2E15Dxalz2FYuiQP0rxptJKvAxFW52Yg7vhZfZN9CrKMfsr2zHUCl94fi0mbfo_sB9MIfXtI5SFtAlJeZxVGNdKzFJVUzPs-Q&h=0KS-bHSK_Z5X2mufjmxdpveUJDWaG6c3JC4e1LoG3NI HTTP/1.1" 200 261
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Response status: 200
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Response headers:
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Cache-Control': 'no-cache'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Pragma': 'no-cache'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Content-Length': '261'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Content-Type': 'application/json; charset=utf-8'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Expires': '-1'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'x-ms-request-id': '9498a081-57df-44b5-9af1-fe92c97a11e9'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Server': 'Microsoft-HTTPAPI/2.0'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'x-ms-ratelimit-remaining-subscription-reads': '11989'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'x-ms-correlation-request-id': '356c866a-0984-4ba4-9267-fdfce966a06a'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'x-ms-routing-request-id': 'NORWAYEAST:20240417T122410Z:356c866a-0984-4ba4-9267-fdfce966a06a'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'X-Content-Type-Options': 'nosniff'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies:     'Date': 'Wed, 17 Apr 2024 12:24:10 GMT'
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: Response content:
14:24:14   Error    |       DEBUG: cli.azure.cli.core.sdk.policies: {
14:24:14   Error    |       "id": "/subscriptions/SUBSCRIPTION_ID/providers/Microsoft.Resources/locations/northeurope/deploymentStackOperationStatus/f242d9d3-2507-4932-8951-823024ebe695",
14:24:14   Error    |       "name": "f242d9d3-2507-4932-8951-823024ebe695",
14:24:14   Error    |       "status": "failed"
14:24:14   Error    |       }
14:24:14   Error    |       DEBUG: cli.azure.cli.core.azclierror: Traceback (most recent call last):
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/core/polling/base_polling.py", line 730, in run
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/core/polling/base_polling.py", line 762, in _poll
14:24:14   Error    |       azure.core.polling.base_polling.OperationFailed: Operation failed or canceled
14:24:14   Error    |       The above exception was the direct cause of the following exception:
14:24:14   Error    |       Traceback (most recent call last):
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\knack/cli.py", line 233, in invoke
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/commands/__init__.py", line 664, in execute
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/commands/__init__.py", line 731, in _run_jobs_serially
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/commands/__init__.py", line 712, in _run_job
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/commands/__init__.py", line 1048, in __call__
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/cli/core/commands/__init__.py", line 1035, in __call__
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/core/polling/_poller.py", line 242, in result
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/core/tracing/decorator.py", line 76, in wrapper_use_tracer
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/core/polling/_poller.py", line 261, in wait
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/core/polling/_poller.py", line 176, in _start
14:24:14   Error    |       File "D:\a\_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\azure/core/polling/base_polling.py", line 745, in run
14:24:14   Error    |       azure.core.exceptions.HttpResponseError: Operation returned an invalid status 'OK'
14:24:14   Error    |       Content: {
14:24:14   Error    |       "id": "/subscriptions/SUBSCRIPTION_ID/providers/Microsoft.Resources/locations/northeurope/deploymentStackOperationStatus/f242d9d3-2507-4932-8951-823024ebe695",
14:24:14   Error    |       "name": "f242d9d3-2507-4932-8951-823024ebe695",
14:24:14   Error    |       "status": "failed"
14:24:14   Error    |       }
14:24:14   Error    |       ERROR: cli.azure.cli.core.azclierror: Operation returned an invalid status 'OK'
14:24:14   Error    |       ERROR: az_command_data_logger: Operation returned an invalid status 'OK'
14:24:14   Error    |       DEBUG: cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x00000283F0EFF1A0>]
14:24:14   Error    |       INFO: az_command_data_logger: exit code: 1
14:24:14   Error    |       INFO: cli.__main__: Command ran in 283.449 seconds (init: 0.726, invoke: 282.724)
14:24:14   Error    |       INFO: telemetry.main: Begin splitting cli events and extra events, total events: 1
14:24:14   Error    |       INFO: telemetry.client: Accumulated 0 events. Flush the clients.
14:24:14   Error    |       INFO: telemetry.main: Finish splitting cli events and extra events, cli events: 1
14:24:14   Error    |       INFO: telemetry.save: Save telemetry record of length 3870 in cache
14:24:14   Error    |       INFO: telemetry.main: Begin creating telemetry upload process.
14:24:14   Error    |       INFO: telemetry.process: Creating upload process: "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe C:\Program Files\Microsoft SDKs\Azure\CLI2\Lib\site-packages\azure\cli\telemetry\__init__.pyc C:\Octopus\Work\20240417121826-420456-7890\azure-cli"
14:24:14   Error    |       INFO: telemetry.process: Return from creating process
14:24:14   Error    |       INFO: telemetry.main: Finish creating telemetry upload process.
14:24:14   Error    |       OperationStopped: Failed with exit code 1

To Reproduce Steps to reproduce the behavior:

  1. Attempt to deploy with az CLI. My deployment stack that is failing is in region north europe.
  2. Wait and see if it fails after the deployment itself is done.

Expected behavior A clear and concise description of what you expected to happen.

I expected at least some error message of WHAT failed instead of just the deploy stack getting status "Failed" without any info (not even correlation ID).

Screenshots If applicable, add screenshots to help explain your problem.

Not sure how helpful it is - but a screenshot from the deployment stack overview page. No failed resources or failed deletions (at least that's reported). Just a "Failed" status, without any more info.

image

Repro Environment Host OS: Powershell Version:

Server Debugging Information Correlation ID: Have not been able to get it anywhere. Tenant ID: c166b9c4-5053-4eec-9665-aba0782d0804 Timestamp of issue (please include time zone): 14:24:14 UTC+2 Data Center (eg, West Central US, West Europe): North Europe (for the resource group/deployment stack). North Europe + West Europe (for the resources being deployed).

Additional context Add any other context about the problem here.

snarkywolverine commented 5 months ago

Hi @hallgeir-osterbo-visma!

I was able to find the error in our logs, but need to do additional digging to figure out what is going on. I'll keep you posted on what I find.

hallgeir-osterbo-visma commented 5 months ago

Things look brighter today - so far today, all deployments we've done has gone through. Was anything fixed during Friday or the weekend?

azcloudfarmer commented 5 months ago

Hello @hallgeir-osterbo-visma this appears to have been a transient error. Closing this issue, but please feel free to reopen if you see this again.