Closed bbrancar closed 1 year ago
The Spark jars directory is $SPARK_HOME/jars/
.
This was helpful, I had failed to correctly download Spark. Thank you
Thanks for the feed. I'll add a link to the Spark installation instructions in the README. Hope it helps future users.
Hi,
I have successfully downloaded a csv via Amazon Athena and would like to perform bulk download of the listed WARC files. After cloning the Github and setting my $SPARK_HOME to my download of pyspark in my virtual environment, I have run the code:
> $SPARK_HOME/bin/spark-submit --class org.commoncrawl.spark.examples.CCIndexWarcExport $APPJAR \ --csv xyx ...
This returned the error:
Failed to find Spark jars directory (xyz)
. Do you have any suggestions on how I can resolve this issue?Thank you