StaPH-B / docker-builds

:package: :whale: Dockerfiles and documentation on tools for public health bioinformatics
GNU General Public License v3.0
182 stars 116 forks source link

Add container viridian #987

Closed soejun closed 2 months ago

soejun commented 2 months ago

Adds container viridian to StaPH-B, resolves [Container Request]: viridian #971

Pull Request (PR) checklist:

Kincekara commented 2 months ago

Hi @soejun, Thank you for contributing to the repository. I think Viridian can benefit from a staged build drastically. Although the image can work this way, a smaller image loads much faster, introduces fewer security issues, and lowers the cloud cost in running time in big projects. So, we prefer to build some of the images in that way.

You can simply move compiled binaries to /usr/local/bin/ in the builder stage and copy /usr/local/bin/* from the builder to the app stage. In addition, you can eliminate extra libraries required for compiling in the app stage. Here is one of our templates you can refer to: https://github.com/StaPH-B/docker-builds/blob/master/dockerfile-template/Dockerfile_builder

We would be happy to help if you have questions for any step.

soejun commented 2 months ago

Hi @soejun, Thank you for contributing to the repository. I think Viridian can benefit from a staged build drastically. Although the image can work this way, a smaller image loads much faster, introduces fewer security issues, and lowers the cloud cost in running time in big projects. So, we prefer to build some of the images in that way.

You can simply move compiled binaries to /usr/local/bin/ in the builder stage and copy /usr/local/bin/* from the builder to the app stage. In addition, you can eliminate extra libraries required for compiling in the app stage. Here is one of our templates you can refer to: https://github.com/StaPH-B/docker-builds/blob/master/dockerfile-template/Dockerfile_builder

We would be happy to help if you have questions for any step.

Oh yup, that's my bad with the binaries. Will make an update later tonight. Thanks!

soejun commented 2 months ago

Hi @soejun, Thank you for contributing to the repository. I think Viridian can benefit from a staged build drastically. Although the image can work this way, a smaller image loads much faster, introduces fewer security issues, and lowers the cloud cost in running time in big projects. So, we prefer to build some of the images in that way.

You can simply move compiled binaries to /usr/local/bin/ in the builder stage and copy /usr/local/bin/* from the builder to the app stage. In addition, you can eliminate extra libraries required for compiling in the app stage. Here is one of our templates you can refer to: https://github.com/StaPH-B/docker-builds/blob/master/dockerfile-template/Dockerfile_builder

We would be happy to help if you have questions for any step.

Updated. Image size down from 2.6gb to 1.79gb.

Kincekara commented 2 months ago

@soejun Thank you for the changes. This is a very complex dockerfile because of the requirements. I made a few changes and necessary rearrangements. The uncompressed image size is ~800MB now. Please check the latest changes in there.

@erinyoung I think you use this program actively. Do you have any more suggestions?

erinyoung commented 2 months ago

That's A LOT of dependencies. This was a ton of work! Thank you for getting this together.

I am a little concerned about using such an old version of samtools. I have a hunch that a lot of these tools are older versions in order to run with mummer.

I'd like to attempt this with ubuntu:jammy. Just a sec.

erinyoung commented 2 months ago

I think @soejun did great in following the build instructions found at https://github.com/iqbal-lab-org/viridian/blob/master/.ci/install_dependencies.sh . The image is probably good as is.

I was a little concerned about using such an old version of samtools and minimap2, so I tested this out in ubuntu:jammy. I also formatted things a little differently so it'd easier for myself to read.

The following builds for me. It uses more-current tools and the final image size is 767MB

ARG VIRIDIAN_VER="1.2.2"
ARG SAMTOOLS_VER="1.20"
ARG BCFTOOLS_VER=${SAMTOOLS_VER}
ARG HTSLIB_VER=${SAMTOOLS_VER}
ARG ENA_VER="1.7.1"
ARG NGMERGE_VER="0.3"
ARG VT_VER="0.57721"
ARG RACON_VER="1.5.0"
ARG MUMMER_VER="4.0.0rc1"
ARG READITANDKEEP_VER="0.3.0"
ARG CYLON_COMMIT_HASH="57d559a76254b0b95785f7c02fa58ef806713e01"
ARG VARIFIER_COMMIT_HASH="8bc8726ed3cdb337dc47b62515e709759e451137"
ARG MINIMAP2_VER="2.28"

## Builder ##
FROM ubuntu:jammy as build
ARG SAMTOOLS_VER
ARG BCFTOOLS_VER
ARG HTSLIB_VER
ARG NGMERGE_VER
ARG VT_VER
ARG RACON_VER
ARG READITANDKEEP_VER
ARG MINIMAP2_VER

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install --no-install-recommends -y \
    wget \
    ca-certificates \
    perl \
    bzip2 \
    cmake \
    git \
    autoconf \
    automake \
    make \
    bzip2 \
    curl \
    gcc \
    g++ \
    gnuplot \
    zlib1g-dev \
    libbz2-dev \
    liblzma-dev \
    libcurl4-gnutls-dev \
    libncurses5-dev \
    libssl-dev \
    libperl-dev \
    libgsl0-dev \
    procps && \
    rm -rf /var/lib/apt/lists/* && apt-get autoclean

# compile bcftools
RUN wget -q https://github.com/samtools/bcftools/releases/download/${BCFTOOLS_VER}/bcftools-${BCFTOOLS_VER}.tar.bz2 && \
    tar -xjf bcftools-${BCFTOOLS_VER}.tar.bz2 && \
    rm -v bcftools-${BCFTOOLS_VER}.tar.bz2 && \
    cd bcftools-${BCFTOOLS_VER} && \
    make && \
    make install

# compile samtools
RUN wget -q https://github.com/samtools/samtools/releases/download/${SAMTOOLS_VER}/samtools-${SAMTOOLS_VER}.tar.bz2 && \
    tar -xjf samtools-${SAMTOOLS_VER}.tar.bz2 && \
    cd samtools-${SAMTOOLS_VER} && \
    ./configure && \
    make && \
    make install

# compile htslib
RUN wget -q https://github.com/samtools/htslib/releases/download/${HTSLIB_VER}/htslib-${HTSLIB_VER}.tar.bz2 && \
    tar -vxjf htslib-${HTSLIB_VER}.tar.bz2 && \
    rm -v htslib-${HTSLIB_VER}.tar.bz2 && \
    cd htslib-${HTSLIB_VER} && \
    make && \
    make install

# compile NGmerge
RUN wget -q https://github.com/harvardinformatics/NGmerge/archive/refs/tags/v${NGMERGE_VER}.tar.gz && \
    tar -vxf v${NGMERGE_VER}.tar.gz && \
    cd NGmerge-${NGMERGE_VER} && \
    make && \
    cp NGmerge /usr/local/bin/.

# compile vt
RUN wget -q https://github.com/atks/vt/archive/refs/tags/${VT_VER}.tar.gz && \
    tar -vxf ${VT_VER}.tar.gz && \
    cd vt-${VT_VER} && \
    make && \
    cp vt /usr/local/bin/.

# compile racon
RUN wget -q https://github.com/lbcb-sci/racon/archive/refs/tags/${RACON_VER}.tar.gz && \
    tar -xvf ${RACON_VER}.tar.gz && \
    cd racon-${RACON_VER} && \
    mkdir build && \
    cd build && \
    cmake -DCMAKE_BUILD_TYPE=Release .. && \
    make && \
    cp bin/racon /usr/local/bin/.

# comile read-it-and-keep
RUN wget -q https://github.com/GlobalPathogenAnalysisService/read-it-and-keep/archive/refs/tags/v${READITANDKEEP_VER}.tar.gz && \
    tar -vxf v${READITANDKEEP_VER}.tar.gz && \
    cd read-it-and-keep-${READITANDKEEP_VER}/src && \
    make && \
    cp readItAndKeep /usr/local/bin/.

# install minimap2 binary
RUN curl -L https://github.com/lh3/minimap2/releases/download/v${MINIMAP2_VER}/minimap2-${MINIMAP2_VER}_x64-linux.tar.bz2 | tar -jxvf - --no-same-owner && \
    cp minimap2-${MINIMAP2_VER}_x64-linux/minimap2 /usr/local/bin

# And because mummer is old it was easier to troubleshoot if I gave it its own stage
FROM ubuntu:jammy as builm
ARG MUMMER_VER

RUN apt-get update && apt-get install -y --no-install-recommends \
    wget \
    git \
    libncurses5-dev \
    libbz2-dev \
    liblzma-dev \
    libcurl4-gnutls-dev \
    zlib1g-dev \
    libssl-dev \
    gcc \
    make \
    perl \
    bzip2 \
    gnuplot \
    ca-certificates \
    gawk \
    curl \
    sed \
    gnuplot \
    build-essential \
    unzip \
    automake \
    autoconf \
    nasm \
    pkgconf \
    libtool  \
    ruby \
    yaggo \
    gcc \
    gcc-11

#RUN update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 100 --slave /usr/bin/g++ g++ /usr/bin/g++-11

# compile mummer (saves to /usr/local/bin)
RUN wget -q https://github.com/mummer4/mummer/archive/refs/tags/v${MUMMER_VER}.tar.gz && \
    tar -xvf v${MUMMER_VER}.tar.gz && \
    cd mummer-${MUMMER_VER} && \
    autoreconf -i && \
    ./configure CXXFLAGS="-std=c++11 -Wno-maybe-uninitialized" LDFLAGS=-static && \
    make && \
    make install && \
    ldconfig

## App ##
FROM ubuntu:jammy as app

ARG VIRIDIAN_VER
ARG ENA_VER
ARG CYLON_COMMIT_HASH
ARG VARIFIER_COMMIT_HASH

LABEL base.image="ubuntu:jammy"
LABEL dockerfile.version="1"
LABEL software="viridian"
LABEL software.version="${VIRIDIAN_VER}"
LABEL description="Ultra-careful amplicon-aware viral assembly for tiled amplicon schemes."
LABEL website="https://github.com/iqbal-lab-org/viridian"
LABEL license="https://github.com/iqbal-lab-org/viridian/blob/master/LICENSE"
LABEL maintainer="Wilson Chan"
LABEL maintainer.email="chan.wilson.wc@gmail.com"
LABEL maintainer2="Kutluhan Incekara"
LABEL maintainer2.email="kutluhan.incekara@ct.gov"
LABEL maintainer3="Erin Young"
LABEL maintainer3.email="eriny@utah.gov"

RUN apt-get update && apt-get install -y --no-install-recommends \
    python3 \
    python3-pip \
    python3-dev \
    gzip \
    gcc \
    perl \
    zlib1g \
    libncurses5 \
    bzip2 \
    liblzma-dev \
    libcurl4-gnutls-dev \
    wget \
    && apt-get autoclean && rm -rf /var/lib/apt/lists/*

COPY --from=build /usr/local/bin/* /usr/local/bin/
COPY --from=build /usr/local/lib/* /usr/local/lib/
COPY --from=builm /usr/local/bin/* /usr/local/bin/
COPY --from=builm /usr/local/lib/* /usr/local/lib/

RUN pip install --no-cache-dir cython

# ENA tools
RUN wget -q https://github.com/enasequence/enaBrowserTools/archive/refs/tags/v${ENA_VER}.tar.gz && \
    tar -xvf v${ENA_VER}.tar.gz && \
    rm v${ENA_VER}.tar.gz

RUN wget -q https://github.com/iqbal-lab-org/cylon/archive/${CYLON_COMMIT_HASH}.zip &&\
    pip install --no-cache-dir ${CYLON_COMMIT_HASH}.zip && \
    rm ${CYLON_COMMIT_HASH}.zip

RUN wget -q https://github.com/iqbal-lab-org/varifier/archive/${VARIFIER_COMMIT_HASH}.zip &&\
    pip install --no-cache-dir ${VARIFIER_COMMIT_HASH}.zip && \
    rm ${VARIFIER_COMMIT_HASH}.zip 

# install viridian
RUN wget -q https://github.com/iqbal-lab-org/viridian/archive/refs/tags/v${VIRIDIAN_VER}.tar.gz && \
    pip install --no-cache-dir v${VIRIDIAN_VER}.tar.gz &&\
    rm v${VIRIDIAN_VER}.tar.gz && \
    mkdir /data 

WORKDIR /data

CMD ["viridian", "--help "]

ENV PATH="/enaBrowserTools-${ENA_VER}/python3:$PATH" LC_ALL=C

## Test ##
FROM app as test

WORKDIR /test

RUN viridian --help 

RUN viridian run_one_sample --run_accession SRR29437696 --outdir OUT && \
    wc -l OUT/consensus.fa.gz OUT/log.json.gz OUT/qc.tsv.gz && \
    head OUT/variants.vcf

RUN viridian run_one_sample --run_accession SRR29437696 --outdir OUT2 --keep_bam && \
    wc -l OUT2/consensus.fa.gz OUT2/log.json.gz OUT2/qc.tsv.gz OUT2/reference_mapped.bam && \
    head OUT2/variants.vcf
Kincekara commented 2 months ago

@erinyoung Thank you, I like the changes you made. Also, I have no objection to using newer tools as long as the program works. I will recheck it and merge it on Monday possibly.

soejun commented 2 months ago

Kind of insane how much was cut down from the original image (4gb roughly). Anyways I'm currently trying to verify that the latest update builds successfully but I'm having an issue with the Cylon layer of the build. I think I'm gonna play around with it a bit more just to isolate the exact issue.

Here is the error log in case anyone currently decides that working on this on a Friday night is the more appealing option over everything else

 > [app 6/9] RUN wget -q https://github.com/iqbal-lab-org/cylon/archive/57d559a76254b0b95785f7c02fa58ef806713e01.zip &&  unzip 57d559a76254b0b95785f7c02fa58ef806713e
01.zip && rm 57d559a76254b0b95785f7c02fa58ef806713e01.zip &&  mv cylon-57d559a76254b0b95785f7c02fa58ef806713e01 cylon && cd cylon &&  python3 -m pip install .:
11.70       exec(code, locals())
11.70     File "<string>", line 437, in <module>
11.70     File "<string>", line 81, in run_make_print_config
11.70     File "/usr/lib/python3.8/subprocess.py", line 415, in check_output
11.70       return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
11.70     File "/usr/lib/python3.8/subprocess.py", line 516, in run
11.70       raise CalledProcessError(retcode, process.args,
11.70   subprocess.CalledProcessError: Command '['make', '-s', 'print-config']' returned non-zero exit status 2.
11.70   ----------------------------------------
11.74 ERROR: Command errored out with exit status 1: /usr/bin/python3 /tmp/tmp1il275h2 get_requires_for_build_wheel /tmp/tmpkqvcf8t0 Check the logs for full command 
output.

I could be missing something but I did try to make sure Docker didn't leave any left over layers that might be causing issues since I ran the build with these commands:

docker system prune -a -f
docker build --progress=plain . -t viridian-local:1.2.2 --no-cache
erinyoung commented 2 months ago

What about skipping the unpacking step?

RUN wget -q https://github.com/iqbal-lab-org/cylon/archive/57d559a76254b0b95785f7c02fa58ef806713e01.zip &&
pip install --no-cache-dir 57d559a76254b0b95785f7c02fa58ef806713e01.zip && 
...
Kincekara commented 2 months ago

@soejun Are you building the image with jammy? I think I saw this error before. Python version causes a problem sometimes in Jammy. Try to use pip install instead of python3 -m pip install that forces python3.8.

I assume you didn't forget to install python3-dev

soejun commented 2 months ago

Turns out it's a personal machine issue.

The latest commit builds successfully on my Linux machine. I'm guessing it's a weird issue on my MacBook that isn't pertinent to this pull request but yes, verified that the new changes work.

erinyoung commented 2 months ago

@soejun , It looks like your commits haven't made it to this PR, yet. What does your Dockerfile look like now?

soejun commented 2 months ago

@soejun , It looks like your commits haven't made it to this PR, yet. What does your Dockerfile look like now?

Just pushed it up. I incorporated your changes, finally image size is 771.75MB

erinyoung commented 2 months ago

This looks great! As soon as the tests finish, I'd like to merge and deploy this image.

If I don't get to merging this evening, I'll hopefully get to it tomorrow morning.

Thank you for putting this together!

erinyoung commented 2 months ago

Thank you for putting this together!

I am going to

  1. merge this PR
  2. deploy viridian to both dockerhub and quay with the tags '1.2.2' and 'latest'

You can check the status of the deployment here : https://github.com/StaPH-B/docker-builds/actions/runs/9781248081