Azure / azure-cli

Azure Command-Line Interface
MIT License
4.01k stars 2.98k forks source link

az deployment randomly crashes with: ERROR: 'bytes' object has no attribute 'get' #19743

Closed allanmo79 closed 2 years ago

allanmo79 commented 3 years ago

Describe the bug when Running az deployments in github runners (azure-cli v2.28.0) it randomly crashes with ERROR: 'bytes' object has no attribute 'get'. When in debug i also get the following error description:

DEBUG: cli.azure.cli.core.sdk.policies: Response content: DEBUG: cli.azure.cli.core.sdk.policies: "error":"code":"InternalServerError","message":"Encountered internal server error. Diagnostic information: timestamp '20210929T131134Z', subscription id '', tracking id 'xxxxxxxxxxxxxx', request correlation id 'xxxxxxxxxxxxxxxxxxx'."*** DEBUG: cli.azure.cli.core.sdk.policies: Request URL: 'https://management.azure.com/providers/Microsoft.Management/managementGroups/ESD-ConnectedS/providers/Microsoft.Resources/deployments/xxxxxxxxxxxxxxxea?api-version=2021-04-01'

python errors: ................... custom.py", line 407, in on_requestAttributeError: 'bytes' object has no attribute 'get' .................. in handle_template_based_exceptionknack.util.CLIError: 'bytes' object has no attribute 'get' ................. ERROR: az_command_data_logger: 'bytes' object has no attribute 'get' DEBUG: cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x03FF4460>]

To Reproduce az deployment sub create --subscription 'xxxxxxxxxxxxxxxxxxxx' --name (new-guid).guid --location westeurope --template-file '............\main.bicep' --parameters '...............\myparams.parameters.json' --debug `

Expected behavior No errors

Environment summary az { "azure-cli": "2.28.0", "azure-cli-core": "2.28.0", "azure-cli-telemetry": "1.0.6", "extensions": {} } 'User-Agent': 'AZURECLI/2.28.0 (MSI) azsdk-python-azure-mgmt-resource/19.0.0 Python/3.8.9 (Windows-10-10.0.17763-SP0) GITHUBACTIONS_AzurePowerShellAction_175xxxxxxxxxxxxxxxx6bdb0dd'

also I am using bicep, not arm!

Additional context Have tried downgrading github runner az version to azure-cli v2.27.2 with same result, also the same error happens on local vm with same conf

yonzhan commented 3 years ago

ARM

allanmo79 commented 3 years ago

ARM

No not ARM Bicep :)

allanmo79 commented 3 years ago

Any news on this?

zhoxing-ms commented 3 years ago

DEBUG: cli.azure.cli.core.sdk.policies: "error":"code":"InternalServerError","message":"Encountered internal server error. Diagnostic information: timestamp '20210929T131134Z', subscription id '', tracking id 'xxxxxxxxxxxxxx', request correlation id 'xxxxxxxxxxxxxxxxxxx'."***

@allanmo79 The essence of this issue may be a service issue. Could you please send me the debug log of this issue by email? My email address is Zhou.Xing@microsoft.com, thanks~

alon-z commented 2 years ago

We have encountered this issue and looks like updating the cli to version 2.29.0 fixed the issue. The changelog does say that there has been a change in az deployment group.

zhoxing-ms commented 2 years ago

@alonikomax Hi, in fact, the service side may have fixed this issue, and the change of CLI has nothing to do with this

az deployment group create: Fix incorrect scope in the example of creating deployment from template-spec

The PR corresponding to the change in changelog is only a modification for example: https://github.com/Azure/azure-cli/pull/19563/files . But thank you for your notice~

@allanmo79 May I ask can this issue be reproduced now? If not, we will close it~

alon-z commented 2 years ago

@alonikomax Hi, in fact, the service side may have fixed this issue, and the change of CLI has nothing to do with this

az deployment group create: Fix incorrect scope in the example of creating deployment from template-spec

The PR corresponding to the change in changelog is only a modification for example: https://github.com/Azure/azure-cli/pull/19563/files . But thank you for your notice~

@allanmo79 May I ask can this issue be reproduced now? If not, we will close it~

Can you elaborate on the service side fix? What was changed and how can we find it (like an issue tracker or status page).

ghost commented 2 years ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @armleads-azure.

Issue Details
**Describe the bug** when Running az deployments in github runners (azure-cli v2.28.0) it randomly crashes with ERROR: 'bytes' object has no attribute 'get'. When in debug i also get the following error description: DEBUG: cli.azure.cli.core.sdk.policies: Response content: DEBUG: cli.azure.cli.core.sdk.policies: ***"error":***"code":"InternalServerError","message":"Encountered internal server error. Diagnostic information: timestamp '20210929T131134Z', subscription id '', tracking id 'xxxxxxxxxxxxxx', request correlation id 'xxxxxxxxxxxxxxxxxxx'."*** DEBUG: cli.azure.cli.core.sdk.policies: Request URL: 'https://management.azure.com/providers/Microsoft.Management/managementGroups/ESD-ConnectedS/providers/Microsoft.Resources/deployments/xxxxxxxxxxxxxxxea?api-version=2021-04-01' ***python errors:*** ................... custom.py", line 407, in on_requestAttributeError: 'bytes' object has no attribute 'get' .................. in handle_template_based_exceptionknack.util.CLIError: 'bytes' object has no attribute 'get' ................. ERROR: az_command_data_logger: 'bytes' object has no attribute 'get' DEBUG: cli.knack.cli: Event: Cli.PostExecute [] **To Reproduce** ` az deployment sub create --subscription 'xxxxxxxxxxxxxxxxxxxx' --name (new-guid).guid --location westeurope ` --template-file '............\main.bicep' --parameters '...............\myparams.parameters.json' --debug ` **Expected behavior** No errors **Environment summary** az { "azure-cli": "2.28.0", "azure-cli-core": "2.28.0", "azure-cli-telemetry": "1.0.6", "extensions": {} } 'User-Agent': 'AZURECLI/2.28.0 (MSI) azsdk-python-azure-mgmt-resource/19.0.0 Python/3.8.9 (Windows-10-10.0.17763-SP0) GITHUBACTIONS_AzurePowerShellAction_175xxxxxxxxxxxxxxxx6bdb0dd' also I am using bicep, not arm! **Additional context** Have tried downgrading github runner az version to azure-cli v2.27.2 with same result, also the same error happens on local vm with same conf
Author: allanmo79
Assignees: zhoxing-ms
Labels: `Service Attention`, `ARM`, `customer-reported`
Milestone: Backlog
zhoxing-ms commented 2 years ago

@wwendyc Could you please help to have a look at this issue? Or do you know who in the service team should look for to investigate the issue of ARM deployment?

alex-frankel commented 2 years ago

@zhoxing-ms -- if this is a result of a service side issue, it seems like CLI is not handling the error response from the request properly. Is there a way to determine how this code path might be hit and work backwards from there?

zhoxing-ms commented 2 years ago

@alex-frankel I haven't received the customer's debug log, so I don't know the specific error path. I guess the error path is related to JsonCTemplatePolicy, could you please provide me with specific debug logs by email? @allanmo79 @alonikomax My email address is Zhou.Xing@microsoft.com

miqm commented 2 years ago

@zhoxing-ms We did face this error 2 days ago - CorrelationId: 15bfc7b3-7a8c-4655-ba79-fb6cbffc0b62. The deployment failed with a very long message 11130 characters long. There was also yet another error on deployment validate that didn't show on Activity Log. @filizt found that there was a 500 error - the correlationId of it: b94a5f62-daf5-478f-92ce-45221ff90446

Since both situations are quite different, perhaps one common thing might be the error response message length being too long?

zhoxing-ms commented 2 years ago

@miqm Do you mean that when the service returns two different errors, CLI will throw the same error 'bytes' object has no attribute 'get', right? If so, could you please send me the debug log by email?

miqm commented 2 years ago

@zhoxing-ms - yes, but unfortunately I don't have debug log - we observed it on our CD system and by default we run without --debug switch, so I've only the message.

anthony-c-martin commented 2 years ago

@zhoxing-ms I was able to repro this - see attached for my logs with --verbose & --debug: debug.txt

zhoxing-ms commented 2 years ago

The REST service returned an internal error, which needs the investigation from service team @alex-frankel

{"error":{"code":"InternalServerError","message":"Encountered internal server error. Diagnostic information: timestamp '20220201T190049Z', subscription id 'd08e1a72-8180-4ed3-8125-9dff7376b0bd', tracking id '27c33452-f84c-4f23-8525-9cb71f82c137', request correlation id '27c33452-f84c-4f23-8525-9cb71f82c137'."}}

correlation id: '27c33452-f84c-4f23-8525-9cb71f82c137'

anthony-c-martin commented 2 years ago

@zhoxing-ms - Alex & I can follow up on the internal server error.

Regardless, the Azure CLI issue is still important to fix - the CLI should communicate this error information to the user rather than returning ERROR: 'bytes' object has no attribute 'get'. This would allow the user to understand that there is a service error, and provide them with the necessary info to troubleshoot or follow up with support channels.

zhoxing-ms commented 2 years ago

@anthony-c-martin OK, we will have the plan to fix the CLI error handling

zhoxing-ms commented 2 years ago

@anthony-c-martin Could you please share the template file and parameter file to me? This is convenient for me to reproduce and investigate the error handling issue

anthony-c-martin commented 2 years ago

Template file (main.json):

{
    "$schema": "https://schema.management.azure.com/schemas/2018-05-01/subscriptionDeploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "metadata": {
        "_generator": {
            "name": "bicep",
            "version": "dev",
            "templateHash": "6486513756060996114"
        }
    },
    "resources": [
        {
            "type": "Microsoft.Resources/resourceGroups",
            "apiVersion": "2019-05-01",
            "name": "rg-bicep",
            "location": "eastus",
            "resources": [
                null
            ]
        }
    ]
}

Command to repro:

az deployment sub validate --location westus --template-file main.json

@zhoxing-ms, note - there's only a limited amount of time this will be repro-able with this template file. We have a fix checked in for the server-side cause of this particular issue, which will probably be deployed over the next few weeks.

zhoxing-ms commented 2 years ago

@anthony-c-martin I have submitted a PR #21220 to fix error handling issue

navba-MSFT commented 2 years ago

@zhoxing-ms Thanks for raising the PR. I see that it has been merged now. If this issue is addressed can we proceed with the archival of this github thread then ?