mattvryan-github commented 3 years ago

What changes were proposed in this pull request?

Changes so Sparkler can be optionally configured to run in a Databricks spark environment

Is this related to an already existing issue on sparkler?

204

Will it close an existing issue?

204

How was this patch tested?

The resulting fat jar zipped up with the conf and plugin directories and copied up to the databricks file system (dbfs). Then scripted to be pulled onto Master node of a cluster, unzipped and executed. Sample crawls and scraps where performed that persisted results in a standalone EC2 Solr server. Then pulled from Solr via rest api.

Please review https://github.com/USCDataScience/sparkler/blob/master/.github/CONTRIBUTING.md before opening a pull request.

mattvryan-github commented 3 years ago

Documentation to follow

buggtb commented 3 years ago

Epic, thanks @mattvryan-github !

USCDataScience / sparkler

Changes so sparkler can be launched inside of a Databricks cluster #205

What changes were proposed in this pull request?

204

204

How was this patch tested?