dincarnato / RNAFramework

RNA structure probing and post-transcriptional modifications mapping high-throughput data analysis
http://www.rnaframework.com
GNU General Public License v3.0
31 stars 11 forks source link

[Suggestion] Add a public Docker image or Dockerfile #30

Closed kenibrewer closed 1 year ago

kenibrewer commented 1 year ago

Hi @dincarnato,

I became interested in RNAFramework while looking into the possibility of building an open-source, containerized Nextflow pipeline for shapeseq analysis. Because of the number of different tools that RNAFramework relies on, it would be very helpful for my potential use case (and likely other users/cases too), if it were available in a public docker image that includes all the dependencies pre-installed.

I ended up spending a few hours today working on putting that together. Despite your thorough (and much appreciated) dependency documentation, this ended up being more challenging than I expected. This was my first time building a Dockerfile around a Perl-based program, and I had never experienced how opaque cpan can be about missing build libraries. πŸ˜… This is what I ended up using for my exploratory pipeline:

ARG RNA_FRAMEWORK_VERSION=e4e05f0088ff0f094a78648158a31a3e6c2e7a82
ARG USERNAME=ubuntu
ARG USER_UID=1000
ARG USER_GID=$USER_UID
ARG BASE_IMAGE=ubuntu:22.04

# Base Image
FROM ${BASE_IMAGE}

# Metadata
LABEL base_image="ubuntu:22.04" \
      version="2.8.3"   \
      software="RNA Framework" \
      software.version=${RNA_FRAMEWORK_VERSION} \
      about.summary="RNA structure probing and post-transcriptional modifications mapping high-throughput data analysis" \
      about.home="https://github.com/dincarnato/RNAFramework" \
      about.documentation="https://rnaframework-docs.readthedocs.io/en/latest/" \
      about.license="SPDX:GPL-3.0-or-later"

# Maintainer
LABEL maintainer="Ken Brewer <ken@kenbrewer.com>"

ENV DEBIAN_FRONTEND=noninteractive
RUN groupadd --gid $USER_GID $USERNAME \
    && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME

# Add apt-get repositories
RUN apt-get update && apt-get install -y \
    wget \
    curl \
    procps \
    build-essential \
    perl \
    cpanminus \
    libxml2-dev \
    libdbd-mysql-perl \
    git && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Install Mambaforge
USER ${USERNAME}
WORKDIR /home/${USERNAME}
RUN wget -O Mambaforge.sh "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" && \
    bash Mambaforge.sh -b -p "${HOME}/conda"
ENV PATH="/home/${USERNAME}/conda/bin:${PATH}"

# Setup Bioconda channel configuration and install dependencies
RUN conda config --add channels defaults && \
    conda config --add channels bioconda && \
    conda config --add channels conda-forge && \
    conda config --set channel_priority strict && \
    mamba install -y \
        python=3.9 \
        gxx_linux-64 \
        bowtie2>=2.3.5 \
        samtools>=1.2 \
        bedtools>=2.0 \
        cutadapt>=2.1 \
        viennarna>=2.4.0 \
        rnastructure>=5.6 && \
    mamba clean -a -y

# Install Perl non-core modules (inc::latest, XML::LibXML, Config::Simple)
RUN cpanm \
    inc::latest \
    XML::LibXML \
    Config::Simple

# Install RNA Framework
RUN mkdir rnaframework && \
    cd rnaframework && \
    git init . && \
    git remote add origin https://github.com/dincarnato/RNAFramework && \
    git fetch --depth 1 origin ${RNA_FRAMEWORK_VERSION} && \
    git checkout FETCH_HEAD
ENV PATH="${HOME}/rnaframework:${PATH}"
WORKDIR /data

I didn't include bowtie1 in the Docker image, but could add it quite easily if you think it would actually see use as part of an RNAFramework image.

I currently have this Docker image hosted at docker.io/kenibrewer/rnaframework. As long as docker is installed on a user's machine, it can be used without any additional dependency installation with the command:

docker run --rm -v $PWD:/data kenibrewer/rnaframework rf-map -h

If you believe this might be useful to other RNAFramework users, there's a number of ways it could potentially be made available according to your preference:

1) I continue maintaining this Docker image independently with some variable delay between when a new RNAFramework version is published and when the associated Docker image is updated. 2) I submit a pull request with just the Dockerfile and some documentation on building/use that your users can use to independently build the docker image. 3) I submit a pull request with a Github Actions pipeline that automatically builds and deploys the docker image along with each new tag/release on master. 4) I vanish into the void never to be seen in RNAFramework's issues again. πŸ˜Άβ€πŸŒ«οΈ

The third option works very well to eliminate ongoing maintenance work, but would require you to set up a public dockerhub repository and store an associated personal access token in Github. If you are interested in that option, I'm happy to talk through that process or provide some additional documentation, whether or not I end up working more on the shapeseq nextflow pipeline.

Cheers!

dincarnato commented 1 year ago

Hi Ken,

thanks a lot. This would definitely be valuable and we have been planning to do this for a long time, and it's definitely overdue now, but we had other priorities. I agree that option 3 is definitely the best, but I won't have bandwidth for a while. Maybe we can kickoff with option 1 and transition to option 3 later on, also based on your availability and interest?

Thanks a lot. All the best,

Danny

kenibrewer commented 1 year ago

Sounds good. I actually already spent some time yesterday putting together a pipeline for the automatic docker builds on my fork of RNAFramework. That will minimize my personal maintenance burden for 1 and will make it easy to transition to 3 when you've got time to set things up on your end.

I'd like to put together some basic automated testing to make sure the docker container works prior to publishing it to dockerhub. Do you have any internal testing code for RNAFramework that you use? If not, I can just put together some basic tests that check if each of required tool dependencies are present in PATH and can report their version.

Also, is there a separate repo for the RNAFramework documentation? I'd be happy to submit a pull request with the some information about the available docker image once I have the testing set up.

dincarnato commented 1 year ago

Yes, the repo is https://github.com/dincarnato/RNAFramework-docs

dincarnato commented 6 months ago

Hi @kenibrewer,

I noticed the docker build failed with the latest commit.

Best, Danny

kenibrewer commented 6 months ago

@dincarnato Happy to take a look. I've opened a new issue #56 to track completion.