Azure / bicep

Bicep is a declarative language for describing and deploying Azure resources
MIT License
3.2k stars 742 forks source link

DNS resolution error when downloading bicep in bitbucket #9774

Closed ausy-tk closed 1 year ago

ausy-tk commented 1 year ago

Bicep version v0.14.6 via az bicep version

Describe the bug When deploying Docker containers to our Azure Container Apps environment with az cli using Bicep template, our deployments randomly fail while attempting to pull the latest Bicep version from downloads.bicep.azure.com with the following error:

ERROR: az_command_data_logger: Error while attempting to retrieve the latest Bicep version: HTTPSConnectionPool(host='downloads.bicep.azure.com', port=443): Max retries exceeded with url: /releases/latest (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fae4f82d660>: Failed to establish a new connection: [Errno -2] Name does not resolve')).

To Reproduce Deploy a Docker container to Azure Container Apps environment with az-cli using a bicep template (but happens randomly)

Additional context We know there is a workaround which was described in issue #3689 to use the ARM json as a template, but anyway this may be should analyzed in more depth, if teams as us want to standardize to Bicep templates.

We don't experience any other internet connection issues within our CI/CD environment

alex-frankel commented 1 year ago

@davidcho23 -- can you take a look at this one? Is this error coming from the CDN?

davidcho23 commented 1 year ago

The CDN endpoint is working fine. I am able to see available Bicep versions and install the latest Bicep version using az bicep list-versions and az bicep install

@majastrz do you have an idea of what the issue might be?

majastrz commented 1 year ago

The "Name does not resolve" part suggests that the Az CLI is unable to resolve the downloads.bicep.azure.com DNS name. We haven't made any DNS changes here in several weeks.

Locally, the name resolves for me successfully as well:

❯ nslookup downloads.bicep.azure.com
Server:  router
Address:  192.168.1.1

Non-authoritative answer:
Name:    part-0041.t-0009.fdv2-t-msedge.net
Addresses:  2620:1ec:4f:1::69
          2620:1ec:4e:1::69
          13.107.237.69
          13.107.238.69
Aliases:  downloads.bicep.azure.com
          bicep-downloads-prod.azureedge.net
          bicep-downloads-prod.afd.azureedge.net
          star-azureedge-prod.trafficmanager.net
          shed.dual-low.part-0041.t-0009.fdv2-t-msedge.net

The random nature of this suggests some DNS resolution issue (rather than a configuration issue on our end) in the CI/CD environment. @ausy-tk can you share any details about the CI/CD environment that is executing the Az CLI commands?

ausy-tk commented 1 year ago

Hi @majastrz, unfortunately the CI/CD environment is not in our hand directly, so I can't provide any details. But I have also opened a support ticket on their side as it seems that the issue is not on your side.

danielmackay commented 1 year ago

I am getting this same error also when trying to use az bicep from a bit bucket pipeline.

This error seems to be intermittent.

OlivierTD commented 1 year ago

I'm getting the same problem on my end on Bitbucket pipelines.

If I run the command az deployment group create with the --debug tag I get this stack trace

DEBUG: cli.azure.cli.command_modules.resource._bicep: Bicep CLI installation path: /root/.azure/bin/bicep
DEBUG: cli.azure.cli.command_modules.resource._bicep: Bicep CLI installed: False.
DEBUG: urllib3.connectionpool: Starting new HTTPS connection (1): [aka.ms:443](http://aka.ms:443/)
DEBUG: urllib3.connectionpool: [https://aka.ms:443](https://aka.ms/) "GET /BicepLatestRelease HTTP/1.1" 301 0
DEBUG: urllib3.connectionpool: Starting new HTTPS connection (1): [downloads.bicep.azure.com:443](http://downloads.bicep.azure.com:443/)
DEBUG: cli.azure.cli.core.azclierror: Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/local/lib/python3.10/socket.py", line 955, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name does not resolve

@danielmackay Since it is intermittent, caching the azure dependencies in my pipeline fixed my issue. This way I avoid downloading bicep on every run.

In my bitbucket-pipelines.yml file:

definitions:
  caches:
    azure: /root/.azure/

After that I just needed one run to succeed. Now it always retrieves my cache and I no longer have this issue!

majastrz commented 1 year ago

We are unable to reproduce the issue ourselves without more information. Would anyone seeing this problem be able to provide any additional environment or networking details that could help us investigate this?

danielmackay commented 1 year ago

@majastrz - This is not specific to ACA, but seems to be any pipeline that tries to use bicep. You can reproduce this by running a bit bucket pipeline that has a step like this:

    - step:
        name: Deploy Infrastructure
        image: mcr.microsoft.com/azure-cli
        script:
          - az login --service-principal -u $AZURE_APP_ID -p $AZURE_PASSWORD --tenant $AZURE_TENANT_ID     
          - az deployment group create --resource-group $RG_NAME --template-file ./deploy/main.bicep 
alex-frankel commented 1 year ago

And is the issue intermittent, or does this happen every time you try to deploy with bitbucket?

majastrz commented 1 year ago

And do you have any pipelines that exhibit that issue that are not using bitbucket?

danielmackay commented 1 year ago

@alex-frankel @majastrz - from my testing this happens every time. We are only using bitbucket.

We also get this behavior from running az bicep upgrade for example within our bitbucket pipeline:

az bicep upgrade
ERROR: Error while attempting to retrieve the latest Bicep version: HTTPSConnectionPool(host='[downloads.bicep.azure.com](http://downloads.bicep.azure.com/)', port=443): Max retries exceeded with url: /releases/latest (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f343fe94c70>: Failed to establish a new connection: [Errno -2] Name does not resolve')).
ausy-tk commented 1 year ago

And do you have any pipelines that exhibit that issue that are not using bitbucket? @alex-frankel @majastrz We're using GitLab and the issue occurred intermittently. But as we use now the workaround in all our pipelines I cannot tell if it now would occur every time. But it should be reproducible by running a pipeline using az-cli with bicep templates.

alex-frankel commented 1 year ago

We are going to try to repro on our end, but any chance you can file a support case with BitBucket? It's very odd that it is only not working using that tool and it seems most likely the issue is on their end.

brwilkinson commented 1 year ago

Can you please test either of the following 2 workarounds ?

image: mcr.microsoft.com/azure-cli:2.34.1   # workaround 1
# image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - az --version
          - az bicep --help
          - az bicep install

image

With newer version of az cli, I see either of the the following

az bicep --help
az: 'bicep' is not in the 'az' command group. See 'az --help'. If the command is from an extension, please make sure the corresponding extension is installed. To learn more about extensions, please visit https://docs.microsoft.com/en-us/cli/azure/azure-cli-extensions-overview

It's possible the regression was related to these new config options?

image

I see the following on the install..

image

The install does appear to complete, with the exception.

image

Other possible workarounds appear to be as follows

image: mcr.microsoft.com/azure-cli:latest 

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - az --version
          - az config set bicep.use_binary_from_path=False   #workaround 2
          - az bicep --help
          - az bicep install

image

Adding text log for for follow up on az cli once we confirm the above workaround.

+ az bicep install
ERROR: The command failed with an unexpected error. Here is the traceback:
ERROR: No section: 'bicep'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/knack/cli.py", line 233, in invoke
    cmd_result = self.invocation.execute(args)
  File "/usr/local/lib/python3.10/site-packages/azure/cli/core/commands/__init__.py", line 663, in execute
    raise ex
  File "/usr/local/lib/python3.10/site-packages/azure/cli/core/commands/__init__.py", line 726, in _run_jobs_serially
    results.append(self._run_job(expanded_arg, cmd_copy))
  File "/usr/local/lib/python3.10/site-packages/azure/cli/core/commands/__init__.py", line 697, in _run_job
    result = cmd_copy(params)
  File "/usr/local/lib/python3.10/site-packages/azure/cli/core/commands/__init__.py", line 333, in __call__
    return self.handler(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/azure/cli/core/commands/command_operation.py", line 121, in handler
    return op(**command_args)
  File "/usr/local/lib/python3.10/site-packages/azure/cli/command_modules/resource/custom.py", line 3648, in install_bicep_cli
    ensure_bicep_installation(cmd.cli_ctx, release_tag=version, target_platform=target_platform)
  File "/usr/local/lib/python3.10/site-packages/azure/cli/command_modules/resource/_bicep.py", line 141, in ensure_bicep_installation
    use_binary_from_path = cli_ctx.config.get("bicep", "use_binary_from_path").lower()
  File "/usr/local/lib/python3.10/site-packages/knack/config.py", line 99, in get
    raise last_ex  # pylint:disable=raising-bad-type
  File "/usr/local/lib/python3.10/site-packages/knack/config.py", line 94, in get
    return config.get(section, option)
  File "/usr/local/lib/python3.10/site-packages/knack/config.py", line 208, in get
    return self.config_parser.get(section, option)
  File "/usr/local/lib/python3.10/configparser.py", line 783, in get
    d = self._unify_values(section, vars)
  File "/usr/local/lib/python3.10/configparser.py", line 1154, in _unify_values
    raise NoSectionError(section) from None
configparser.NoSectionError: No section: 'bicep'
To check existing issues, please visit: https://github.com/Azure/azure-cli/issues
To open a new issue, please run `az feedback`
Installing Bicep CLI v0.15.31...
brwilkinson commented 1 year ago

Tagging @ausy-tk @danielmackay @OlivierTD

Can you please test above workarounds?

brwilkinson commented 1 year ago

I believe this has been reported and a fix is rolling out:

brwilkinson commented 1 year ago

I see @danielmackay commented on that other thread that the workaround was successful.

alex-frankel commented 1 year ago

I'm going to close this one since it looks like Ben is right on the root cause and the issue is being tracked on the az CLI side. We can re-open if needed.

az-core commented 1 year ago

We ran into above DNS issue intermittently from time to time. Below are my observations so far -

We also encountered the issue mentioned Error with Azure CLI 2.46.0 and Bicep if no bicep configuration exists. Applying the fix (and temporary work around) for #25710 resolves the bicep configuration problem but not the intermittent DNS problem.

We are using the latest image mcr.microsoft.com/azure-cli (2.46.0) where bicep was installed using command az bicep install. Also tried using other commands: az bicep list-versions and install a specific version of the bicep tool. All these resulted in the DNS errors intermittently.

For troubleshooting, switched over to using mcr.microsoft.com/azure-functions/dotnet:4-dotnet6-core-tools instead and the DNS issue seems resolved. I haven't tried using any other image with Azure CLI yet. Switching back to azure-cli image resurfaces the problem intermittently.

krizskp commented 1 year ago

I also use BitBucket pipelines and this DNS issue was happening since before before the https://github.com/Azure/azure-cli/issues/25710 issue started. Using workaround 2, it fixed the bicep config issue but the DNS issue still remained. Since recently, the DNS issue seems to become more frequent. There was a day when bicep finally got installed after 11 runs of the pipeline.

So in agreement with az-core's comment above, this is still not fixed.

brwilkinson commented 1 year ago

@krizskp can you please test using the older azure-cli image ?

i.e. workaround 1 image: mcr.microsoft.com/azure-cli:2.34.1 # workaround 1

krizskp commented 1 year ago

@krizskp can you please test using the older azure-cli image ?

i.e. workaround 1 image: mcr.microsoft.com/azure-cli:2.34.1 # workaround 1

No, this didn't work for me.

brwilkinson commented 1 year ago

@krizskp what did you mean by "didn't work" ?

are you using image: mcr.microsoft.com/azure-cli:latest at the moment ?

Can you please provide some more info or logs from your error messages that relate to the DNS error.

are you using hosted runners or what Workspace runners are you using ?

krizskp commented 1 year ago

@krizskp what did you mean by "didn't work" ?

are you using image: mcr.microsoft.com/azure-cli:latest at the moment ?

With workaround 1, having azure-cli version set to 2.34.1, I still get ERROR: Error while attempting to retrieve the latest Bicep version: HTTPSConnectionPool(host='[downloads.bicep.azure.com](http://downloads.bicep.azure.com/)', port=443): Max retries exceeded with url: /releases/latest (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f8f2f3706d0>: Failed to establish a new connection: [Errno -2] Name does not resolve')).

brwilkinson commented 1 year ago

@krizskp Thank you ✅ will re-open.

brwilkinson commented 1 year ago

@krizskp are you using hosted runners or what Workspace runners are you using ?

brwilkinson commented 1 year ago

Appears to be other reports of this more widespread i.e. not specific to bicep

krizskp commented 1 year ago

@krizskp are you using hosted runners or what Workspace runners are you using ?

@brwilkinson I'm running it on BitBucket pipelines.

Fails at this command: az deployment group create ...

brwilkinson commented 1 year ago

@krizskp was this intermittent for you?

Also what region? Can you check which DNS servers you are using?

I still am not able to repro so far... in bitbucket... I can setup a deployment schedule if it is intermittent ?

image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - apk update 
          - apk add bind-tools
          - uname -a
          - cat /etc/resolv.conf
          - nslookup downloads.bicep.azure.com
          - dig downloads.bicep.azure.com
          - ping downloads.bicep.azure.com -4 -c 2

all resolve correctly to 13.107.237.69

image

krizskp commented 1 year ago

@brwilkinson Yes, it was intermittent.

Here's the output:

image

image

image

image

brwilkinson commented 1 year ago

Thank you @krizskp for the output...

It looks like you are using an internal IP Address for your DNS server? ec2.local are you familiar with this server?

either way it appears DNS is actually working correctly... however the deployment fails...

So are you also also applying the workaround ?

az config set bicep.use_binary_from_path=False

full steps for repro and testing.

image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - apk update 
          - apk add bind-tools
          - uname -a
          - cat /etc/resolv.conf
          - nslookup downloads.bicep.azure.com
          - dig downloads.bicep.azure.com
          - ping downloads.bicep.azure.com -4 -c 2

          - az --version
          # - az config set bicep.version_check=True
          - az config set bicep.use_binary_from_path=False
          - az bicep --help
          - az bicep install

output from the additional steps

image

krizskp commented 1 year ago

@brwilkinson I'm using BitBucket pipelines and don't configure the DNS myself. It is managed by BitBucket I suppose. And ec2.local also comes with their pipeline docker containers.

And yes, I'm using the az config set bicep.use_binary_from_path=False workaround, but that is for another (https://github.com/Azure/azure-cli/issues/25710) issue.

brwilkinson commented 1 year ago

Please test the complete config below and provide the logs.

image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - apk update 
          - apk add bind-tools
          - uname -a
          - cat /etc/resolv.conf
          - nslookup downloads.bicep.azure.com
          - dig downloads.bicep.azure.com
          - ping downloads.bicep.azure.com -4 -c 2

          - az --version
          # - az config set bicep.version_check=True
          - az config set bicep.use_binary_from_path=False
          - az bicep --help
          - az bicep install
brwilkinson commented 1 year ago

As mentioned further up in this thread...

We believe it's directly related and have confirmed with many that the workaround has been successful.

You just need to add the following prior to running your deployment... until the fix is rolled out in the az cli image.

az config set bicep.use_binary_from_path=False
az deployment group create --template-file xyx etc .....
krizskp commented 1 year ago

Please test the complete config below and provide the logs.

image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          - apk update 
          - apk add bind-tools
          - uname -a
          - cat /etc/resolv.conf
          - nslookup downloads.bicep.azure.com
          - dig downloads.bicep.azure.com
          - ping downloads.bicep.azure.com -4 -c 2

          - az --version
          # - az config set bicep.version_check=True
          - az config set bicep.use_binary_from_path=False
          - az bicep --help
          - az bicep install

This is another message which sometimes pops up.

image

image

image

image

image

brwilkinson commented 1 year ago

Thank you @krizskp ... interesting.

I don't see the warning, to show the 'use_binary_from_path=False'

image

Since the DNS looks good, can we try the following.

image: mcr.microsoft.com/azure-cli:latest

pipelines:
  default:
    - step:
        name: 'Deployment to Staging'
        deployment: staging
        script:
          # - apk update 
          # - apk add bind-tools
          # - uname -a
          # - cat /etc/resolv.conf
          # - nslookup downloads.bicep.azure.com
          # - dig downloads.bicep.azure.com
          # - ping downloads.bicep.azure.com -4 -c 2

          # - az --version
          # - az config set bicep.version_check=True
          - az config set bicep.use_binary_from_path=False
          #- az bicep --help
          - az config get
          - az bicep install

e.g.

image

brwilkinson commented 1 year ago

Looks like the fix for the config has been merged

I guess we will see a new version in the next week.

brwilkinson commented 1 year ago

Azure-CLI 2.47.0 is now latest, will close.

anthony-c-martin commented 1 year ago

@krizskp if you get a chance, some other commands that would be useful to help debug:

brwilkinson commented 1 year ago

Reopen - confirm DNS issue has resolution.

brwilkinson commented 1 year ago

Hi @krizskp any additional status updates on the DNS failures?