USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
412 stars 143 forks source link

Sparkler not distributing work over nodes #234

Open buggtb opened 3 years ago

buggtb commented 3 years ago

I dunno if there is anything obvious that springs to mind here @thammegowda or @karanjeets from back in the day.

When I run Sparkler as a spark submit job on a databricks cluster, even with the partitioned RDD's all the tasks are executed on the master node

You can see here, the workload is executed on the master node, but not pushed to the other 3. Which seems entirely reproducable. index

dgoldenberg-ias commented 8 months ago

Hi has there been any resolution to the issue?