dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
405 stars 227 forks source link

Add server side parameters to session connection method #823

Closed JCZuurmond closed 1 year ago

JCZuurmond commented 1 year ago

resolves #690

Description

Pass existing server_side_parameters to session connection wrapper and use to configure SparkSession.

Checklist

JCZuurmond commented 1 year ago

Prefer this PR over #691. @alarocca-apixio : We took your commits and touched up the code so that it can be merged.

Fokko commented 1 year ago

Works on my end:

cat ~/.dbt/profiles.yml 
dbt_tabular:
  outputs:
    dev:
      method: session
      schema: dbt_tabular
      type: spark
      host: NA
      server_side_parameters:
        "spark.driver.memory": "2g"
  target: dev

I can see that this is being picked up by the process:

ps aux | grep -i spark                          
fokkodriesprong  11191 150.0  0.4 413414240 269024 s008  S+   11:14AM   0:01.99 /opt/homebrew/Cellar/openjdk@11/11.0.19/libexec/openjdk.jdk/Contents/Home/bin/java -cp /opt/homebrew/lib/python3.9/site-packages/pyspark/conf:/opt/homebrew/lib/python3.9/site-packages/pyspark/jars/* -Xmx2g -XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit --conf spark.driver.memory=2g --conf spark.sql.catalogImplementation=hive pyspark-shell
fokkodriesprong  11203   0.0  0.0 408626896   1312 s010  S+   11:14AM   0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn --exclude-dir=.idea --exclude-dir=.tox -i spark

We can see that --conf spark.driver.memory=2g is set.

JCZuurmond commented 1 year ago

@Fleid: This PR is ready to be merged

JCZuurmond commented 1 year ago

@colin-rogers-dbt: Could you fix the CI?

colin-rogers-dbt commented 1 year ago

This looks ready to merge but I think we should add a functional test case (will need to think where/how we can) and update our unit tests like #577

colin-rogers-dbt commented 1 year ago

also is this dependent on #577 ?