StatCan / aaw-contrib-jupyter-notebooks

Jupyter Notebooks to be used with Advanced Analytics Workspace platform
Other
10 stars 13 forks source link

Spark/Databricks Example #5

Closed blairdrummond closed 4 years ago

blairdrummond commented 4 years ago

Would be great to see databricks API integration and ideally pyspark examples.

blairdrummond commented 4 years ago

@ca-scribner

https://github.com/krishnan-r/sparkmonitor

ca-scribner commented 4 years ago

Successfully connected to a path 2 databricks cluster using databricks-connect and info about the databricks instance and a live cluster. Can run pyspark commands from a local pycharm, etc., using cluster for compute. Not sure how to translate this to using pyspark from a notebook server in kubernetes.

I don't think the databricks-connect method is the way - is a connection through vanilla pyspark appropriate? Steve's medium post goes that route but I couldn't fully reproduce it.

@sylus / @zachomedia any tips or past code fragments using pyspark? I only saw this in the repos

sylus commented 4 years ago

I still need to do some final tweaks and will let you know but this is roughly how I ported the actions one over and gave an example using kfp dsl.

https://github.com/StatCan/jupyter-notebooks/blob/aa95f12590d5f288aad8be43bee930d19bc002b2/ai-pipeline/03-DataBricksComputePi.ipynb