dask / hdfs3

A wrapper for libhdfs3 to interact with HDFS from Python
http://hdfs3.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
136 stars 40 forks source link

Huge image size #149

Open gsakkis opened 6 years ago

gsakkis commented 6 years ago

I'm including hdfs3 in a Docker image and the resulting size is huge:

$ docker run -it --rm hdfs3 du -sh /opt/conda
798M    /opt/conda

Removing /opt/conda/pkgs/* and all the __pycache__ directories reduces it to 500M but it's still quite large. I realize this has mostly to do with the an underlying libhdfs library but I'm wondering if you know of other directories or files that can be safely deleted (e.g include/boost/*, libicudata*, etc).

Sample Dockerfile:

FROM continuumio/miniconda3:4.3.27p0
RUN conda install --channel=conda-forge hdfs3
RUN rm -rf /opt/conda/pkgs/* && find /opt/conda -name __pycache__ -exec rm -fr {} +
martindurant commented 6 years ago

That sounds reasonable. Would you like to make a PR? The same could also be done for mdurant/hadoop's image https://github.com/martindurant/docker_images/blob/master/hadoop/Dockerfile , which I often use for testing too.

Note that the conda clean command may achieve the same thing, I am not sure.

mrocklin commented 6 years ago

FWIW I now end conda install lines in Dockerfiles with conda clean -tipsy

On Sun, Jan 21, 2018 at 2:10 PM, Martin Durant notifications@github.com wrote:

That sounds reasonable. Would you like to make a PR? The same could also be done for mdurant/hadoop's image https://github.com/ martindurant/docker_images/blob/master/hadoop/Dockerfile , which I often use for testing too.

Note that the conda clean https://conda.io/docs/commands/conda-clean.html command may achieve the same thing, I am not sure.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/hdfs3/issues/149#issuecomment-359272173, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGF5ejsR-pzJklbyvsIoWhSQgJAeks5tM4u5gaJpZM4RlmOg .