dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.54k stars 1.45k forks source link

Mention spark installation in pyspark examples #10228

Open dagsir[bot] opened 2 years ago

dagsir[bot] commented 2 years ago

Dagster Documentation Gap

This issue was generated from the slack conversation at: https://dagster.slack.com/archives/C01U954MEER/p1666710133910359?thread_ts=1666710133.910359&cid=C01U954MEER

Conversation excerpt

U03MXV86UNS: Hi I downloaded the github example with pyspark, but it doesn't worked. Here is the error message: U030M2AL48M: <@UULA0R2LV> is there some hadoop configuration that needs to be set for this example project? U03MXV86UNS: Any News? U03MXV86UNS: <@U016C4E5CP8> do you know something about this? UULA0R2LV: Hi <@U03MXV86UNS> apologies for the delay — I was out of office.

Did you have spark installed in your environment? U03MXV86UNS: I've just installed pyspark UULA0R2LV: In addition to pip install pyspark, you’ll also need to install Spark: https://spark.apache.org/downloads.html U03MXV86UNS: Oh god, amateur's mistake, I'm sorry UULA0R2LV: no worries at all. we’ll make it clear in the instructions. UULA0R2LV: <@U018K0G2Y85> docs Mention spark installation in pyspark examples


Message from the maintainers:

Are you looking for the same documentation content? Give it a :thumbsup:. We factor engagement into prioritization.

simonvanderveldt commented 1 year ago

I don't know how (Py)Spark is used in this example, but the PySpark wheel includes Spark/the Spark JARs, there should be nothing else you need to install. You can check after installing the Python package:

~/s/t/tstsprk ❯ python -m venv .venv                                                     
~/s/t/tstsprk ❯ source .venv/bin/activate          
(.venv) ~/s/t/tstsprk ❯ pip install pyspark
Collecting pyspark
  Downloading pyspark-3.3.1.tar.gz (281.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 281.4/281.4 MB 4.6 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting py4j==0.10.9.5
  Downloading py4j-0.10.9.5-py2.py3-none-any.whl (199 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.7/199.7 kB 11.7 MB/s eta 0:00:00
Using legacy 'setup.py install' for pyspark, since package 'wheel' is not installed.
Installing collected packages: py4j, pyspark
  Running setup.py install for pyspark ... done
Successfully installed py4j-0.10.9.5 pyspark-3.3.1
(.venv) ~/s/t/tstsprk ❯ ls -ahl .venv/lib/python3.10/site-packages/pyspark/jars/spark*
-rw-r--r-- 1 simon simon  12M Oct 15 11:55 .venv/lib/python3.10/site-packages/pyspark/jars/spark-catalyst_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon  11M Oct 15 11:50 .venv/lib/python3.10/site-packages/pyspark/jars/spark-core_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 424K Oct 15 11:51 .venv/lib/python3.10/site-packages/pyspark/jars/spark-graphx_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 698K Oct 15 12:08 .venv/lib/python3.10/site-packages/pyspark/jars/spark-hive_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 554K Oct 15 12:15 .venv/lib/python3.10/site-packages/pyspark/jars/spark-hive-thriftserver_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 513K Oct 15 12:14 .venv/lib/python3.10/site-packages/pyspark/jars/spark-kubernetes_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon  83K Oct 15 11:46 .venv/lib/python3.10/site-packages/pyspark/jars/spark-kvstore_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon  76K Oct 15 11:46 .venv/lib/python3.10/site-packages/pyspark/jars/spark-launcher_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 292K Oct 15 12:13 .venv/lib/python3.10/site-packages/pyspark/jars/spark-mesos_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 5.9M Oct 15 12:05 .venv/lib/python3.10/site-packages/pyspark/jars/spark-mllib_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 114K Oct 15 11:51 .venv/lib/python3.10/site-packages/pyspark/jars/spark-mllib-local_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 2.4M Oct 15 11:46 .venv/lib/python3.10/site-packages/pyspark/jars/spark-network-common_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 157K Oct 15 11:46 .venv/lib/python3.10/site-packages/pyspark/jars/spark-network-shuffle_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon  51K Oct 15 12:08 .venv/lib/python3.10/site-packages/pyspark/jars/spark-repl_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon  31K Oct 15 11:46 .venv/lib/python3.10/site-packages/pyspark/jars/spark-sketch_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 8.5M Oct 15 12:01 .venv/lib/python3.10/site-packages/pyspark/jars/spark-sql_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 1.1M Oct 15 11:52 .venv/lib/python3.10/site-packages/pyspark/jars/spark-streaming_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon  15K Oct 15 11:46 .venv/lib/python3.10/site-packages/pyspark/jars/spark-tags_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon  11K Oct 15 11:46 .venv/lib/python3.10/site-packages/pyspark/jars/spark-tags_2.12-3.3.1-tests.jar
-rw-r--r-- 1 simon simon  52K Oct 15 11:46 .venv/lib/python3.10/site-packages/pyspark/jars/spark-unsafe_2.12-3.3.1.jar
-rw-r--r-- 1 simon simon 350K Oct 15 12:13 .venv/lib/python3.10/site-packages/pyspark/jars/spark-yarn_2.12-3.3.1.jar