Arcadia-Science / seqqc

A Nextflow pipeline to identify quality control issues with new sequencing data.
MIT License
28 stars 0 forks source link

documenting creation of temporary multiqc docker image for sourmash reporting. #15

Open taylorreiter opened 1 year ago

taylorreiter commented 1 year ago

I implemented new sourmash multiqc modules. I would like these to be integrated into the seqqc pipeline now, but I'm guessing there will be substantial lag time between 1) PRs being merged into multiqc 2) a new conda release of multiqc 3) a new biocontainers docker for the new conda release of multiqc.

So in the meantime, I'm creating a temporary docker container that will have multiqc installed from my branch, github.com/taylorreiter/MultiQC branch ter/add-sourmash-gather (which was branched from ter/add-sourmash-compare and so has both).

I built the docker container using these instructions: https://phoenixnap.com/kb/how-to-commit-changes-to-docker-image

sudo docker pull python:3.11.1 # 3.11 is tested in multiqc continuous integration.
sudo docker images # used to find the IMAGE ID
sudo docker run -it 2a887161de9a bin/bash
# modified the container by installing my branch of multiqc
pip install git+https://github.com/taylorreiter/MultiQC@ter/add-sourmash-gather
# confirm install
which multiqc
multiqc --version
# exit from the container
exit
# display a list of launched containers
sudo docker ps -a
# commit changes to the image (the first hash is the CONTAINER_ID)
sudo docker commit fc841af9207c 20221209-multiqc-sourmash
# verify that the container is available locally
sudo docker images

Image shared on docker hub using the following instructions: https://docs.docker.com/get-started/04_sharing_app/

Available here: https://hub.docker.com/repository/docker/taylorreiter/20221209-multiqc-sourmash

taylorreiter commented 1 year ago

called in nextflow module using (untested):

container "${ workflow.containerEngine == 'docker' ? 'taylorreiter/20221209-multiqc-sourmash:b8ea142':
        '' }"
elizabethmcd commented 1 year ago

small comment as I think you'll do this anyway - put the dockerfile for this and the documentation described above in the workflow repo itself as well

taylorreiter commented 1 year ago

i don't have a docker file, there is no docker file output by this approach.

elizabethmcd commented 1 year ago

ah ok I think I haven't come upon this approach before where you commit changes to an existing docker image - so this is doing all of this without creating a Dockerfile like when you create a docker image from scratch? this is nifty

taylorreiter commented 1 year ago

courtesy of @austinhpatton! currently failing CI because of platform things, but i'll fix it soon :)

taylorreiter commented 1 year ago

Documenting docker installation on linux ubuntu to fix the platform issue (followed these instructions https://docs.docker.com/engine/install/ubuntu/):

prep for docker install:

sudo apt-get update
sudo apt-get install \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Install the docker engine

sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo docker run hello-world

Allow login

sudo chmod 666 /var/run/docker.sock
mertcelebi commented 1 year ago

Just dropping a note here, in the future, we should include Dockerfiles for Docker images we're sharing, so others can easily replicate + extend as needed.

This shouldn't be a big issue, because presumably there will be a more formal docker image for multqc-sourmash, is that the right assumption @taylorreiter ?

taylorreiter commented 1 year ago

yes! once the sourmash PRs are merged into multiqc, we'll use the conda/biocontainer/quay.io image for multiqc.