dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.02k stars 1.63k forks source link

[CT-3378] [Bug] "External connection exception occurred: not a gzip file" error when running dbt deps #9063

Open Visya opened 1 year ago

Visya commented 1 year ago

Is this a new bug in dbt-core?

Current Behavior

When running dbt deps behind proxy (with HTTP_PROXY, HTTPS_PROXY and FTP_PROXY variables set up), I get an error:

External connection exception occurred: not a gzip file

I tried installing with git, which works, but as soon as I try to install any bigger package that depends on dbt_utils, it fails with "Found duplicate project "dbt_utils". This occurs when a dependency has the same project name as some other dependency."

Expected Behavior

Packages are able to be installed behind the proxy with dbt deps.

Steps To Reproduce

  1. Set HTTP_PROXY, HTTPS_PROXY environment variables.
  2. Add a package to packages.yml.
    packages:
    - package: dbt-labs/dbt_utils
    version: 1.1.1
  3. Run dbt deps.
  4. Observe error "External connection exception occurred: not a gzip file".

Relevant log output

11:17:28  Running with dbt=1.7.1
11:17:29  Updating lock file in file path: [...]code\test\src/package-lock.yml
11:17:30  Installing dbt-labs/dbt_utils
11:17:39  Encountered an error:
External connection exception occurred: not a gzip file

Environment

- OS: Windows 10 Enterprise
- Python: 3.11.5
- dbt: 1.7.1

Which database adapter are you using with dbt?

postgres

Additional Context

I have python-certifi-win32 installed, which resolved issues similar to this issue.

jtcohen6 commented 1 year ago

Hey @Visya! I see you're using v1.7.1 — is this working for you on earlier versions of dbt-core? We need to understand if this is a regression (due to some change in v1.7), versus that has never worked.

Visya commented 1 year ago

Hi @jtcohen6, I am setting up a new environment in a new system, so it never worked for me. But I have other colleagues in the same network, that are able to download packages behind the proxy.

graciegoheen commented 1 year ago

@Visya do you happen to know which version of dbt your colleagues are using?

josezeta commented 7 months ago

Currently we are running some workflows using DBT-core v 1.7.4 in which this error is happeing. This is happening ramdomly with no exact pattern.

josezeta commented 7 months ago

@Visya I do not think this is a BUG related to DBT-core, It could maybe that due the concurrency of getting these packages from github is causing some crash in some cases.

jfo8001 commented 7 months ago

I saw this error for the first time today having upgraded our snowflake connector recently. DBT core has remained the same for the last 9 months. dbt-core = "^1.5.2" dbt-snowflake = "^1.5.2" snowflake-connector-python = "^3.7.1"

Last week we upgraded from snowflake-connector-python = "^3.1.0"

dbeatty10 commented 7 months ago

@josezeta and @jfo8001 did each of you see this error for the first time today?

If so, was it intermittent, or did it happen every time to you ran dbt deps? Are you using a proxy like the original poster, by any chance?

Also, could you share your environment details?

- OS:
- Python:
- dbt:

Examples:

OS: Ubuntu 20.04 Python: 3.9.12 (python3 --version) dbt-core: 1.1.1 (dbt --version)

josezeta commented 7 months ago

@dbeatty10 I'm encountering an intermittent error with dbt deps. It occurred once today, but subsequent runs (every 15 minutes) have been successful without code or environment changes. The error seems related to downloading a package's .gzip artifact. This suggests a possible transient issue with GitHub, which I believe is the artifact repository.

Environment

OS: Debian, docker image https://hub.docker.com/layers/library/python/3.8.17/images/sha256-c293ab0afb856e1f378bd676d43010154827f3e228ba859f66e5ffa6c850427e Python: 3.8.17 dbt-core: 1.7.4

dbeatty10 commented 7 months ago

A transient issue with GitHub sounds like a very possible explanation.

Agreed that this doesn't look like an issue in dbt-core, and it being rare and intermittent both supporting that judgement.

I'm going to leave this as awaiting_reponse in case anyone adds more information that we should consider.

jturner18 commented 7 months ago

Wanted to add I also am dealing with the "External connection exception occurred: not a gzip file" and occasionally a rare "External connection exception occurred: Compressed file ended before the end-of-stream marker was reached" error.

OS: Microsoft Windows 10 Enterprise - 10.0.19045 Build 19045 Python: 3.11.9 DBT: 1.7.13 Snowflake: 1.7.3

jturner18 commented 7 months ago

I think it is specifically a git issue with dbt-labs/dbt-utils (not all packages)

Even if I reference link: https://hub.getdbt.com/dbt-labs/dbt_utils/1.1.1/ and put code:

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1

in packages.yml -> get error: Version error for package dbt-labs/dbt_utils: Could not find a satisfactory version from options: ['=1.1.1', '>=0.8.0', '<2.0.0', '>=0.8.0', '<0.9.0', '>=0.8.1', '<0.9.0']

if try code:

packages:
  - package: dbt-labs/dbt_utils
    version: [">=0.8.0", "<0.10.0"]

in packages.yml -> get error: External connection exception occurred: not a gzip file

If I comment out only the dbt_utils reference, all other packages install fine, but then packages-lock.yml still adds dbt-labs/dbt-utils to the end of the file in which it errors.

If any advice available let me know.

dbeatty10 commented 7 months ago

@jturner18 are you still experiencing this issue?

Both of the packages.yml files below worked for me when I ran this:

dbt deps --upgrade
# one version of packages.yml 

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1
# another version of packages.yml 

packages:
  - package: dbt-labs/dbt_utils
    version: [">=0.8.0", "<0.10.0"]
IL-Jerry commented 3 months ago

I had the same issue with my dbt Cloud job running every 15 minutes. It happened twice and haven't happened since. The same error message

------------------------------------------------------------
Invoke dbt Command
------------------------------------------------------------
dbt deps

23:15:14 Running with dbt=1.7.17
23:15:15 Installing dbt-labs/dbt_utils
dbt command failed23:15:35 Encountered an error:
External connection exception occurred: not a gzip file

My config:

Can someone take a look at this please? Thanks.

dbeatty10 commented 3 months ago

@IL-Jerry Our best guess is that this is a caused by transient issue with GitHub.

IL-Jerry commented 3 months ago

Thanks @dbeatty10 I'm waiting an official explanation from someone from dbt Labs to explain to my clients. cc @jtcohen6

dbeatty10 commented 3 months ago

@IL-Jerry I'm one of the maintainers of dbt-core. This error message has been reported several times within this GitHub issue. Each time it has been intermittent, and we haven't been able to reproduce the issue ourselves.

https://github.com/dbt-labs/dbt-core/issues/4579 was opened by Scott Barber, who lead the dbt Cloud support team for several years. He wrote that this type of error is "typically ... indicative of a transient problem with github itself".

If you'd like further support, you can reach out to the dbt Cloud Support team via the dbt Cloud web interface (see below for screenshots). If so, please include a link to this issue when you create the support ticket.

Classic navigation for dbt Cloud

Click question mark icon (?) > Create a support ticket

image

Newest navigation interface for dbt Cloud:

Help & Guides > Create a support ticket

image

More detail

Here is the source code that raises that exception. It will re-attempt the download up to 5 times with 1 second wait in between each attempt. After that, it will raise an exception that starts with "External connection exception occurred: " followed by the original exception message.

dbeatty10 commented 3 months ago

@IL-Jerry I connected with our dbt Cloud Support team.

They shared that GitHub had an outage August 14, 2024 between 23:02 UTC and 23:38 UTC. So if your clients were experiencing this issue in that time frame, then that would be consistent with this GitHub outage.

The GitHub outage affected dbt Cloud services, and we reported this on our status page as well.

If you have any other questions or feedback, please reach out to dbt Cloud Support team, and they'd be glad to help.

IL-Jerry commented 3 months ago

Thanks @dbeatty10 for the detailed response.

pranjalbhatt commented 2 months ago

Hi,

I am receiving the same error ConnectionError: External connection exception occurred: not a gzip file while downloading via packages.yml file.

NOTE: Error only occurs while downloading dbt-utlis package, may be because it dependency for elementary. I have not mentioned dbt-utils in my packages.yml file. Other packages get installed quickly.

packages.yml file looks like-

packages:
  - package: calogica/dbt_expectations
    version: 0.10.4
  - package: brooklyn-data/dbt_artifacts
    version: 2.6.4
  - package: elementary-data/elementary
    version: 0.16.1
dbeatty10 commented 2 months ago

@pranjalbhatt are you still seeing this error? Or did it resolve?

pranjalbhatt commented 2 months ago

@pranjalbhatt are you still seeing this error? Or did it resolve?

Yes. I am still seeing this error. Unable to download dbt-utils and elementary. Tried to change versions as well. Is there a fix ?

dbeatty10 commented 2 months ago

@pranjalbhatt I am able to successfully install the elementary version 0.16.1 using dbt deps via both dbt-core 1.7 and 1.8.

Can you share the output of dbt --version so I can see which version of dbt-core you are using?

pranjalbhatt commented 2 months ago

@pranjalbhatt I am able to successfully install the elementary version 0.16.1 using dbt deps via both dbt-core 1.7 and 1.8.

Can you share the output of dbt --version so I can see which version of dbt-core you are using?

Yes, still unable to download the elementary. I am using dbt-core.

Core:
  - installed: 1.8.1

As a temporary fix, I have used git clone to do install the package.

dbeatty10 commented 2 months ago

~@pranjalbhatt I am not able to replicate the issue you are having, and I suspect it is distinct from the other reports in this issue. If you are continuing to experience this, could you open a new issue here?~

EDIT: Nevermind! I see that you opened https://github.com/dbt-labs/dbt-utils/issues/950.