Closed gschmutz closed 6 months ago
Just realized that the Docker build from the dbt-core
project (https://github.com/dbt-labs/dbt-core/tree/main/docker) does not work for dbt-spark
when using the PyHive
version (or the default all
)
docker build --tag my-dbt-spark:1.6.0 --target dbt-spark --build-arg dbt_core_ref=dbt-core@v1.6.0 --build-arg dbt_spark_ref=dbt-spark@v1.6.0 --build-arg dbt_spark_version=PyHive .
produces an error:
...
107.2 note: This error originates from a subprocess, and is likely not a problem with pip.
107.2 ERROR: Failed building wheel for sasl
107.2 Running setup.py clean for sasl
107.4 Building wheel for future (setup.py): started
108.0 Building wheel for future (setup.py): finished with status 'done'
108.0 Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492023 sha256=0dced4fde8484b7cf07f3ca722cbe787880c6fcb8eb27af37c82213dd20b48b8
108.0 Stored in directory: /tmp/pip-ephem-wheel-cache-pgg_8qmj/wheels/da/19/ca/9d8c44cd311a955509d7e13da3f0bea42400c469ef825b580b
108.0 Building wheel for PyHive (setup.py): started
108.3 Building wheel for PyHive (setup.py): finished with status 'done'
108.3 Created wheel for PyHive: filename=PyHive-0.6.5-py3-none-any.whl size=51554 sha256=b78987c7c11b9d3a18704d5339f9d1caf6221976e1f4c572f609fac9dd9da102
108.3 Stored in directory: /tmp/pip-ephem-wheel-cache-pgg_8qmj/wheels/cc/b2/8d/74115da1b8e1ee44544ec7870783c9fbf1127b66d296f6c4be
108.3 Building wheel for pure-sasl (setup.py): started
108.6 Building wheel for pure-sasl (setup.py): finished with status 'done'
108.6 Created wheel for pure-sasl: filename=pure_sasl-0.6.2-py3-none-any.whl size=11423 sha256=ef452afe0aeb515f2ad15f63e0df15ea5c620fef4e4f7d4413de8ebdb05b064e
108.6 Stored in directory: /tmp/pip-ephem-wheel-cache-pgg_8qmj/wheels/be/bd/15/23761a50b737a712aacac51c718906ce3563705a336d2c4ffc
108.6 Successfully built pyspark thrift dbt-spark logbook minimal-snowplow-tracker future PyHive pure-sasl
108.6 Failed to build sasl
108.6 ERROR: Could not build wheels for sasl, which is required to install pyproject.toml-based projects
------
Dockerfile:104
--------------------
102 | /tmp/* \
103 | /var/tmp/*
104 | >>> RUN python -m pip install --no-cache-dir "git+https://github.com/dbt-labs/${dbt_spark_ref}#egg=dbt-spark[${dbt_spark_version}]"
105 |
106 |
--------------------
ERROR: failed to solve: process "/bin/sh -c python -m pip install --no-cache-dir \"git+https://github.com/dbt-labs/${dbt_spark_ref}#egg=dbt-spark[${dbt_spark_version}]\"" did not complete successfully: exit code: 1
It works fine if I'm using the ODBC
spark version.
Update: I have the same problem locally (not in docker) if I switch from Python 3.10 to 3.11. So problem is related to https://github.com/dbt-labs/dbt-spark/issues/864
The issue here is related to the sasl
package which does not work with python 3.11 anymore. To make this work, you need to install pyhive
with extra hive_pure_sasl
which uses pure-sasl
instead of the sasl
package. To make this work, dbt-spark should use pyhive[hive_pure_sasl]
instead of just pyhive
when installing dbt-spark[pyhive]
.
You can easily reproduce this issue by running pip install pyhive
vs pip install pyhive[hive_pure_sasl]
on a python 3.11 installation.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Is this a new bug in dbt-spark?
Current Behavior
For all other dbt adapters the latest versions are available as docker images in packages. For dbt-spark docker images are available but the latest versions are missing.
Expected Behavior
docker images for
dbt-spark
in version1.5.0
and1.6.0
are available.Steps To Reproduce
n.a.
Relevant log output
No response
Environment
Additional Context
No response