USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410 stars 143 forks source link

Updates kubernetes deployment config and containers for v1.14 and v1.15 #178

Closed RyanStonebraker closed 4 years ago

RyanStonebraker commented 4 years ago

What changes were proposed in this pull request?

The kubernetes deployment config file was updated to work with kubernetes versions 1.14 and 1.15 to allow for deployment to AWS EKS. All API versions declared in the config were switched to stable releases, as the features that were previously used from the betas were since incorporated.

Zookeeper was removed from the deployment file due to unaccounted for problems that arose from its incorporation (solr would not run and would not give a meaningful error). However, all sparkler pods appear to work fine with a single solr instance.

The standalone-sparkler docker image was rebuilt from the current state of the repository and the sparkler-init docker image was slightly modified and rebuilt to deal with permissions issues that prevented the crawldb core to be created in solr.

A minor misspelling was also corrected in Crawler.scala.

Is this related to an already existing issue on sparkler?
Related to #166 and #167.

Will it close an existing issue?
Closes #167, #166.

How was this patch tested?

The deployment was manually tested on minikube running both kubernetes versions 1.14 and 1.15 and on a 3-node AWS EKS cluster.

Please review https://github.com/USCDataScience/sparkler/blob/master/.github/CONTRIBUTING.md before opening a pull request.

thammegowda commented 4 years ago

@buggtb Please review this. (I do not know Kubernetes 😐)