dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
395 stars 221 forks source link

`dbt-core` Dockerfile does not work for `dbt-spark` due to `PyHive` #975

Closed dbeatty10 closed 8 months ago

dbeatty10 commented 8 months ago

Originally posted by @gschmutz in https://github.com/dbt-labs/dbt-spark/issues/873#issuecomment-1679177998

Just realized that the Docker build from the dbt-core project (https://github.com/dbt-labs/dbt-core/tree/main/docker) does not work for dbt-spark when using the PyHive version (or the default all)

docker build --tag my-dbt-spark:1.6.0 --target dbt-spark --build-arg dbt_core_ref=dbt-core@v1.6.0 --build-arg dbt_spark_ref=dbt-spark@v1.6.0 --build-arg dbt_spark_version=PyHive .

produces an error:

...
107.2   note: This error originates from a subprocess, and is likely not a problem with pip.
107.2   ERROR: Failed building wheel for sasl
107.2   Running setup.py clean for sasl
107.4   Building wheel for future (setup.py): started
108.0   Building wheel for future (setup.py): finished with status 'done'
108.0   Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492023 sha256=0dced4fde8484b7cf07f3ca722cbe787880c6fcb8eb27af37c82213dd20b48b8
108.0   Stored in directory: /tmp/pip-ephem-wheel-cache-pgg_8qmj/wheels/da/19/ca/9d8c44cd311a955509d7e13da3f0bea42400c469ef825b580b
108.0   Building wheel for PyHive (setup.py): started
108.3   Building wheel for PyHive (setup.py): finished with status 'done'
108.3   Created wheel for PyHive: filename=PyHive-0.6.5-py3-none-any.whl size=51554 sha256=b78987c7c11b9d3a18704d5339f9d1caf6221976e1f4c572f609fac9dd9da102
108.3   Stored in directory: /tmp/pip-ephem-wheel-cache-pgg_8qmj/wheels/cc/b2/8d/74115da1b8e1ee44544ec7870783c9fbf1127b66d296f6c4be
108.3   Building wheel for pure-sasl (setup.py): started
108.6   Building wheel for pure-sasl (setup.py): finished with status 'done'
108.6   Created wheel for pure-sasl: filename=pure_sasl-0.6.2-py3-none-any.whl size=11423 sha256=ef452afe0aeb515f2ad15f63e0df15ea5c620fef4e4f7d4413de8ebdb05b064e
108.6   Stored in directory: /tmp/pip-ephem-wheel-cache-pgg_8qmj/wheels/be/bd/15/23761a50b737a712aacac51c718906ce3563705a336d2c4ffc
108.6 Successfully built pyspark thrift dbt-spark logbook minimal-snowplow-tracker future PyHive pure-sasl
108.6 Failed to build sasl
108.6 ERROR: Could not build wheels for sasl, which is required to install pyproject.toml-based projects
------
Dockerfile:104
--------------------
 102 |         /tmp/* \
 103 |         /var/tmp/*
 104 | >>> RUN python -m pip install --no-cache-dir "git+https://github.com/dbt-labs/${dbt_spark_ref}#egg=dbt-spark[${dbt_spark_version}]"
 105 |
 106 |
--------------------
ERROR: failed to solve: process "/bin/sh -c python -m pip install --no-cache-dir \"git+https://github.com/dbt-labs/${dbt_spark_ref}#egg=dbt-spark[${dbt_spark_version}]\"" did not complete successfully: exit code: 1

It works fine if I'm using the ODBC spark version.

Update: I have the same problem locally (not in docker) if I switch from Python 3.10 to 3.11. So problem is related to https://github.com/dbt-labs/dbt-spark/issues/864

dbeatty10 commented 8 months ago

Proposed solution is described here: https://github.com/dbt-labs/dbt-spark/issues/873#issuecomment-1795150406

dbeatty10 commented 8 months ago

Closing as a duplicate of https://github.com/dbt-labs/dbt-spark/issues/864