dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.36k stars 1.56k forks source link

[CT-3378] [Bug] "External connection exception occurred: not a gzip file" error when running dbt deps #9063

Open Visya opened 8 months ago

Visya commented 8 months ago

Is this a new bug in dbt-core?

Current Behavior

When running dbt deps behind proxy (with HTTP_PROXY, HTTPS_PROXY and FTP_PROXY variables set up), I get an error:

External connection exception occurred: not a gzip file

I tried installing with git, which works, but as soon as I try to install any bigger package that depends on dbt_utils, it fails with "Found duplicate project "dbt_utils". This occurs when a dependency has the same project name as some other dependency."

Expected Behavior

Packages are able to be installed behind the proxy with dbt deps.

Steps To Reproduce

  1. Set HTTP_PROXY, HTTPS_PROXY environment variables.
  2. Add a package to packages.yml.
    packages:
    - package: dbt-labs/dbt_utils
    version: 1.1.1
  3. Run dbt deps.
  4. Observe error "External connection exception occurred: not a gzip file".

Relevant log output

11:17:28  Running with dbt=1.7.1
11:17:29  Updating lock file in file path: [...]code\test\src/package-lock.yml
11:17:30  Installing dbt-labs/dbt_utils
11:17:39  Encountered an error:
External connection exception occurred: not a gzip file

Environment

- OS: Windows 10 Enterprise
- Python: 3.11.5
- dbt: 1.7.1

Which database adapter are you using with dbt?

postgres

Additional Context

I have python-certifi-win32 installed, which resolved issues similar to this issue.

jtcohen6 commented 8 months ago

Hey @Visya! I see you're using v1.7.1 — is this working for you on earlier versions of dbt-core? We need to understand if this is a regression (due to some change in v1.7), versus that has never worked.

Visya commented 8 months ago

Hi @jtcohen6, I am setting up a new environment in a new system, so it never worked for me. But I have other colleagues in the same network, that are able to download packages behind the proxy.

graciegoheen commented 8 months ago

@Visya do you happen to know which version of dbt your colleagues are using?

josezeta commented 3 months ago

Currently we are running some workflows using DBT-core v 1.7.4 in which this error is happeing. This is happening ramdomly with no exact pattern.

josezeta commented 3 months ago

@Visya I do not think this is a BUG related to DBT-core, It could maybe that due the concurrency of getting these packages from github is causing some crash in some cases.

jfo8001 commented 3 months ago

I saw this error for the first time today having upgraded our snowflake connector recently. DBT core has remained the same for the last 9 months. dbt-core = "^1.5.2" dbt-snowflake = "^1.5.2" snowflake-connector-python = "^3.7.1"

Last week we upgraded from snowflake-connector-python = "^3.1.0"

dbeatty10 commented 3 months ago

@josezeta and @jfo8001 did each of you see this error for the first time today?

If so, was it intermittent, or did it happen every time to you ran dbt deps? Are you using a proxy like the original poster, by any chance?

Also, could you share your environment details?

- OS:
- Python:
- dbt:

Examples:

OS: Ubuntu 20.04 Python: 3.9.12 (python3 --version) dbt-core: 1.1.1 (dbt --version)

josezeta commented 3 months ago

@dbeatty10 I'm encountering an intermittent error with dbt deps. It occurred once today, but subsequent runs (every 15 minutes) have been successful without code or environment changes. The error seems related to downloading a package's .gzip artifact. This suggests a possible transient issue with GitHub, which I believe is the artifact repository.

Environment

OS: Debian, docker image https://hub.docker.com/layers/library/python/3.8.17/images/sha256-c293ab0afb856e1f378bd676d43010154827f3e228ba859f66e5ffa6c850427e Python: 3.8.17 dbt-core: 1.7.4

dbeatty10 commented 3 months ago

A transient issue with GitHub sounds like a very possible explanation.

Agreed that this doesn't look like an issue in dbt-core, and it being rare and intermittent both supporting that judgement.

I'm going to leave this as awaiting_reponse in case anyone adds more information that we should consider.

jturner18 commented 2 months ago

Wanted to add I also am dealing with the "External connection exception occurred: not a gzip file" and occasionally a rare "External connection exception occurred: Compressed file ended before the end-of-stream marker was reached" error.

OS: Microsoft Windows 10 Enterprise - 10.0.19045 Build 19045 Python: 3.11.9 DBT: 1.7.13 Snowflake: 1.7.3

jturner18 commented 2 months ago

I think it is specifically a git issue with dbt-labs/dbt-utils (not all packages)

Even if I reference link: https://hub.getdbt.com/dbt-labs/dbt_utils/1.1.1/ and put code:

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1

in packages.yml -> get error: Version error for package dbt-labs/dbt_utils: Could not find a satisfactory version from options: ['=1.1.1', '>=0.8.0', '<2.0.0', '>=0.8.0', '<0.9.0', '>=0.8.1', '<0.9.0']

if try code:

packages:
  - package: dbt-labs/dbt_utils
    version: [">=0.8.0", "<0.10.0"]

in packages.yml -> get error: External connection exception occurred: not a gzip file

If I comment out only the dbt_utils reference, all other packages install fine, but then packages-lock.yml still adds dbt-labs/dbt-utils to the end of the file in which it errors.

If any advice available let me know.

dbeatty10 commented 2 months ago

@jturner18 are you still experiencing this issue?

Both of the packages.yml files below worked for me when I ran this:

dbt deps --upgrade
# one version of packages.yml 

packages:
  - package: dbt-labs/dbt_utils
    version: 1.1.1
# another version of packages.yml 

packages:
  - package: dbt-labs/dbt_utils
    version: [">=0.8.0", "<0.10.0"]