genepi / imputationserver2

MIT License
0 stars 2 forks source link

[PR] Reduce uncompressed image size 30% #5

Closed abought closed 10 months ago

abought commented 1 year ago

Purpose

Reduce uncompressed image size (reported by docker images) by ~30%.

Our AWS batch use case required adjustments to the image, and I started to run out of disk space while tinkering locally. These are some very conservative changes that reduce uncompressed image size by 30%. (will try to verify once I deal with the disk space issue, unless someone else gets there first)

There are some further refactorings I can propose once we've gotten the base infra finalized on our side, but this is a quick tweak to get started.

Proposed changes

This repo disables forks and pull requests, so I've inlined the modified dockerfile as text in an issue ticket. There is always a way. 🐙

Explanation

All temporary files (cache and gz files) are now downloaded and deleted in the same docker instruction. Apt and conda cache size can be significant, and this prevents temp files from being saved as layers.

Code

FROM ubuntu:18.04
MAINTAINER Lukas Forer <lukas.forer@i-med.ac.at> / Sebastian Schönherr <sebastian.schoenherr@i-med.ac.at>

# Install compilers
RUN apt-get update && \
    apt-get install -y wget build-essential zlib1g-dev liblzma-dev libbz2-dev libxau-dev && \
    apt-get -y clean

#  Install miniconda
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \
  /bin/bash ~/miniconda.sh -b -p /opt/conda
ENV PATH=/opt/conda/bin:${PATH}

COPY environment.yml .
RUN conda update -y conda && \
    conda env update -n root -f environment.yml && \
    conda clean --all

# Install eagle
ENV EAGLE_VERSION=2.4.1
WORKDIR "/opt"
# RUN wget https://storage.googleapis.com/broad-alkesgroup-public/Eagle/downloads/old/Eagle_v${EAGLE_VERSION}.tar.gz && \
RUN wget https://storage.googleapis.com/broad-alkesgroup-public/Eagle/downloads/Eagle_v2.4.1.tar.gz && \
    tar xvfz Eagle_v${EAGLE_VERSION}.tar.gz && \
    rm Eagle_v${EAGLE_VERSION}.tar.gz && \
    mv Eagle_v${EAGLE_VERSION}/eagle /usr/bin/.

# Install beagle
ENV BEAGLE_VERSION=18May20.d20
WORKDIR "/opt"
RUN wget https://faculty.washington.edu/browning/beagle/beagle.${BEAGLE_VERSION}.jar && \
    mv beagle.${BEAGLE_VERSION}.jar /usr/bin/.

# Install bcftools
ENV BCFTOOLS_VERSION=1.13
WORKDIR "/opt"
RUN wget https://github.com/samtools/bcftools/releases/download/${BCFTOOLS_VERSION}/bcftools-${BCFTOOLS_VERSION}.tar.bz2  && \
    tar xvfj bcftools-${BCFTOOLS_VERSION}.tar.bz2 && \
    rm bcftools-${BCFTOOLS_VERSION}.tar.bz2 && \
    cd  bcftools-${BCFTOOLS_VERSION}  && \
    ./configure  && \
    make && \
    make install

# Install minimac4
WORKDIR "/opt"
RUN mkdir minimac4
COPY files/bin/minimac4 minimac4/.
ENV PATH="/opt/minimac4:${PATH}"
RUN chmod +x /opt/minimac4/minimac4

# Install PGS-CALC
ENV PGS_CALC_VERSION=v0.9.14
RUN mkdir "/opt/pgs-calc"
WORKDIR "/opt/pgs-calc"
RUN wget https://github.com/lukfor/pgs-calc/releases/download/${PGS_CALC_VERSION}/installer.sh  && \
    bash installer.sh && \
    mv pgs-calc.jar /usr/bin/. && \
    rm installer.sh

# Install imputationserver-utils
ENV IMPUTATIONSERVER_UTILS_VERSION=v1.2.1
RUN mkdir /opt/imputationserver-utils
WORKDIR "/opt/imputationserver-utils"
RUN wget https://github.com/genepi/imputationserver-utils/releases/download/${IMPUTATIONSERVER_UTILS_VERSION}/imputationserver-utils.tar.gz && \
    tar xvfz imputationserver-utils.tar.gz && \
    rm imputationserver-utils.tar.gz && \
    chmod +x /opt/imputationserver-utils/bin/tabix

#COPY files/bin/imputationserver-utils.tar.gz /opt/imputationserver-utils/.

# Install ccat
ENV CCAT_VERSION=1.1.0
RUN wget https://github.com/jingweno/ccat/releases/download/v${CCAT_VERSION}/linux-amd64-${CCAT_VERSION}.tar.gz && \
    tar xfz linux-amd64-${CCAT_VERSION}.tar.gz && \
    rm linux-amd64-${CCAT_VERSION}.tar.gz && \
    cp linux-amd64-${CCAT_VERSION}/ccat /usr/local/bin/ && \
    chmod +x /usr/local/bin/ccat

# Needed, because imputationserver-utils starts process (e.g. tabix)
ENV JAVA_TOOL_OPTIONS="-Djdk.lang.Process.launchMechanism=vfork"

COPY files/bin/trace /usr/bin/.
COPY files/bin/vcf2geno /usr/bin/.
seppinho commented 10 months ago

Excellent, the changes have been integrated