Open vakarisbk opened 12 months ago
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Vakaris. This is most likely caused by a git client misconfiguration; please make sure to:
git config --list | grep email
git config --global user.email email@example.com
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Vakaris. This is most likely caused by a git client misconfiguration; please make sure to:
git config --list | grep email
git config --global user.email email@example.com
Seeing as there is some recent activity on Issue #814, and knowing that there are at least a couple of people actively using this fork, I've updated it. Looking forward for any insights regarding the implementation, as well as the likelihood of this pr getting merged.
partially resolves #814 docs dbt-labs/docs.getdbt.com/#
Problem
dbt-spark has limited options for open-source Spark integrations. Currently, the only available method to run dbt with open-source Spark in production is through a Thrift connection. However, a Thrift connection isn't suitable for all use cases. For instance, it doesn't support thrift over HTTP. Also, the PyHive project, that dbt thrift relies on, is unsupported (at least according to their GitHub page).
Solution
Propose introducing support for Spark Connect (for SQL models only).
Checklist
How to test locally?
./start-connect-server.sh --packages org.apache.spark:spark-connect_2.12:3.5.0 --conf spark.sql.catalogImplementation=hive
Known issues: #901