USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

fix docker - permission issues and smaller images #126

Closed thammegowda closed 6 years ago

thammegowda commented 6 years ago

What changes were proposed in this pull request?

  1. Permission issue fixed for sparkler reported in #124
  2. Docker build is optimized. The maven build is run in host and then copied to docker. Reasons:
    • to share .m2/repository cache between docker builds
    • To minimize the image size (by excluding all unnecessary files)

Is this related to an already existing issue on sparkler?

124

Will it close an existing issue?
Closes #124

How was this patch tested?

Removed all docker images using docker rmi freshly built docker images.

thammegowda commented 6 years ago

@buggtb Need your review for this PR.

buggtb commented 6 years ago

I've not tried building but the dockerfile looks good. The only change I would make, and its unrelated to your PR, is set the Ubuntu image to a specific version, :xenial or something so that it doesn't go weird with different Java versions or something when new releases happen.

thammegowda commented 6 years ago

thanks @buggtb