apache / accumulo-proxy

Apache Accumulo Proxy
https://accumulo.apache.org
Apache License 2.0
9 stars 19 forks source link

accumulo-proxy #22 - Reduce Docker image size #23

Open volmasoft opened 4 years ago

volmasoft commented 4 years ago

Updated the Dockerfile to change some of the behavior and reduce the image size to 1.46GB from 1.86GB.

I also tried to keep the download_verify_and_extract() function readable given it now downloads the TARs, extracts them and useful symlinks in place. I moved the HASH strings to SHA512 to match the apache.org artifacts found along side the binaries.

I've put this up early as there's some discussion on the issue https://github.com/apache/accumulo-proxy/issues/22

volmasoft commented 4 years ago

After discussion on the main issue I've pulled the alpine change into this pull.

This is probably where I'm going to stop on this pull, there's a further discussion to be had around Hadoop docs but for now this pull has reduced the image size from 1.86GB to 1.06GB.

keith-turner commented 4 years ago

@volmasoft on the issue removing the hadoop docs folder was mentioned, but I did not see it.

Based on the default Accumulo classpath, maybe everything except ${HADOOP_HOME}/share/hadoop/client/* and the hadoop conf dir could be removed.

volmasoft commented 4 years ago

@volmasoft on the issue removing the hadoop docs folder was mentioned, but I did not see it.

Based on the default Accumulo classpath, maybe everything except ${HADOOP_HOME}/share/hadoop/client/* and the hadoop conf dir could be removed.

I wasn't sure if it's allowable to remove it e.g. does the license restrict it, as I'm no expert in this arena I was hoping someone else would be able to provide guidance.

Looking at the official license (https://apache.org/licenses/LICENSE-2.0) it appears that we should be fine (see 4. Redistribution) "4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:"

So perhaps we can modify this but if so do we need to put any documentation/license information anywhere?

I didn't want to do something in good faith but potentially cause issues for us down the line so hence I haven't included that change currently.

keith-turner commented 4 years ago

So perhaps we can modify this but if so do we need to put any documentation/license information anywhere?

I think we would be fine making a modification w/o doing anything additional. We could also wget the two needed jars from maven central possibly instead of getting the hadoop tar ball and then trimming it.

I didn't want to do something in good faith but potentially cause issues for us down the line so hence I haven't included that change currently.

I think if anything was done to reduce the Hadoop size, it would make sense to do it in its own PR.

milleruntime commented 2 years ago

@volmasoft @keith-turner This PR got lost. Either of you OK with merging as-is?

DomGarguilo commented 1 year ago

It seems like there are some good improvements here. If there is no activity on this PR I may open a new one and try to incorporate some of these changes at some point.