Bioconductor / bioconductor_docker

Docker Containers for Bioconductor
https://bioconductor.org/help/docker/
Artistic License 2.0
73 stars 31 forks source link

Move dependencies out of Dockerfile #32

Open jwokaty opened 3 years ago

jwokaty commented 3 years ago

This PR creates Ubuntu-files to track docker-specific dependencies and skip dependencies that we don't want installed from BBS.

Regarding libmariadb-dev-compat, I've put it in apt_required.txt but commented it out because it depends on libmariadb-dev and conflicts with libmysqlclient-dev, which gets installed on the build system. We can choose to skip libmysqlclient-dev by putting it in apt_skip.txt and uncomment libmariadb-dev-compat in apt_required.txt so that it gets installed.

bin/install.sh installs BBS dependencies, comparing the BBS Ubuntu-files to this repo's Ubuntu-files.

When I tested, I was able to install the following packages:

 [1] "a4"             "a4Base"         "bioCancer"      "BioMM"         
 [5] "BLMA"           "bnbc"           "canceR"         "ChemmineOB"    
 [9] "cicero"         "CoGAPS"         "ctgGEM"         "CytoTree"      
[13] "edge"           "GeneTonic"      "gpuMagic"       "igvR"          
[17] "methylscaper"   "monocle"        "phemd"          "podkat"        
[21] "projectR"       "RCyjs"          "spatialHeatmap" "tenXplore"     
[25] "tradeSeq"       "Travel"         "uSORT"          "webbioc"    

On 6/29, docker images reported the size as 4.57GB.

I'd appreciate any feedback to improve this PR. You can see my PR for BBS at https://github.com/Bioconductor/BBS/pull/84.

nturaga commented 3 years ago

Thanks for the PR @jwokaty.

The overall size of the docker image currently is much larger by about 750MB (approx)

bioconductor/bioconductor_docker                              jw-update              337ddea798d9   45 hours ago   4.65GB
bioconductor/bioconductor_docker                              devel                  1458fe590fe7   46 hours ago   3.93GB

The key questions for this image are:

  1. Does this PR make the bioconductor/bioconductor_docker:devel image the "same" as the BBS linux machine? (was that the goal here? if so, we could make a new image bioconductor_docker:linux_builder --> We still need to test if it's the same as the BBS machine though.)

  2. What are the "extra" 750MB worth of system dependencies?

  3. One thing for me that makes this PR a little complicated to read is that the packages that are being installed aren't "explicit" anymore. They are lost in the apt-*.txt files and then, within the awk commands.

I'm happy to help on any of these, and welcome thoughts from @jwokaty, @hpages, @vjcitn and @mtmorgan .

hpages commented 3 years ago

I took a look at Dockerfile, went thru the list of deb packages that are explicitly listed in the file, and annotated them. This should help us decide what to do with each of them. The goal is that each deb package should go in one of the following lists:

  1. apt_required_build.txt
  2. apt_required_compile_R.txt
  3. apt_optional_compile_R.txt
  4. apt_extra_fonts.txt
  5. apt_cran.txt
  6. apt_bioc.txt
  7. apt_nice_to_have.txt
  8. apt_docker_only.txt

All these lists (except the last one) are in https://github.com/Bioconductor/BBS/tree/master/Ubuntu-files/20.04/. The last one (apt_docker_only.txt) would need to be created. It would list stuff that is maybe nice to have on the Docker image for developers but ~are~ is not strictly required to install/run Bioconductor. Your input will be valuable @nturaga to decide whether or not you want to keep these things on the Docker image.

Here's the annotated list extracted from Dockerfile:

    ## Basic deps
    gdb \                          add to apt_nice_to_have.txt
    libxml2-dev \                  already in apt_cran.txt
    python3-pip \                  already in apt_required_build.txt
    libz-dev \                     who needs that? maybe create a new list
                                       (e.g. apt_docker_only.txt) and add to it
    liblzma-dev \                  already in apt_required_compile_R.txt
    libbz2-dev \                   already in apt_required_compile_R.txt
    libpng-dev \                   already in apt_optional_compile_R.txt
    libgit2-dev \                  already in apt_cran.txt
    ## sys deps from bioc_full
    pkg-config \                   add to apt_nice_to_have.txt
    fortran77-compiler \           we use gfortran (in apt_required_compile_R.txt) on the build
                                       machines
    byacc \                        who needs that? maybe add to apt_docker_only.txt
    automake \                     already in apt_bioc.txt
    curl \                         we use libcurl4-openssl-dev on the build
                                       machines (needed for CRAN packages RCurl
                                       and curl)
    ## This section installs libraries
    libpcre2-dev \                 already in apt_required_compile_R.txt
    libnetcdf-dev \                already in apt_bioc.txt
    libhdf5-serial-dev \           who needs that? maybe add to apt_docker_only.txt
    libfftw3-dev \                 already in apt_cran.txt
    libopenbabel-dev \             already in apt_bioc.txt
    libopenmpi-dev \               we use mpi-default-dev (apt_cran.txt) on the build machines
    libxt-dev \                    already in apt_required_compile_R.txt
    libudunits2-dev \              already in apt_cran.txt
    libgeos-dev \                  already in apt_cran.txt
    libproj-dev \                  already in apt_cran.txt
    libcairo2-dev \                already in apt_optional_compile_R.txt
    libtiff5-dev \                 we use libtiff-dev (in apt_optional_compile_R.txt) on the
                                       build machines
    libreadline-dev \              already in apt_required_compile_R.txt
    libgsl0-dev \                  we use libgsl-dev (in apt_bioc.txt) on the build machines
    libgslcblas0 \                 who needs that? maybe add to apt_docker_only.txt
    libgtk2.0-dev \                already in apt_cran.txt
    libgl1-mesa-dev \              gets automatically installed by libglu1-mesa-dev so maybe no
                                       need for an explicit install
    libglu1-mesa-dev \             already in apt_cran.txt
    libgmp3-dev \                  we use libgmp-dev (in apt_cran.txt) on the build machines
    libhdf5-dev \                  who needs that? maybe add to apt_docker_only.txt
    libncurses-dev \               gets automatically installed by libreadline-dev but maybe
                                       add it to apt_required_compile_R.txt anyway just in case
    libbz2-dev \                   already in apt_required_compile_R.txt
    libxpm-dev \                   who needs that? maybe add to apt_docker_only.txt
    liblapack-dev \                we don't use the system LAPACK library on the build machines
    libv8-dev \                    already in apt_cran.txt
    libgtkmm-2.4-dev \             already in apt_bioc.txt
    libmpfr-dev \                  already in apt_cran.txt
    libmodule-build-perl \         who needs that? maybe add to apt_docker_only.txt
    libapparmor-dev \              who needs that? maybe add to apt_docker_only.txt
    libprotoc-dev \                who needs that? maybe add to apt_docker_only.txt
    librdf0-dev \                  who needs that? maybe add to apt_docker_only.txt
    libmagick++-dev \              already in apt_cran.txt
    libsasl2-dev \                 already in apt_cran.txt
    libpoppler-cpp-dev \           already in apt_cran.txt
    libprotobuf-dev \              already in apt_cran.txt
    libpq-dev \                    already in apt_cran.txt
    libperl-dev \                  already in apt_cran.txt
    ## software - perl extentions and modules
    libarchive-extract-perl \      who needs that? maybe add to apt_docker_only.txt
    libfile-copy-recursive-perl \  who needs that? maybe add to apt_docker_only.txt
    libcgi-pm-perl \               who needs that? maybe add to apt_docker_only.txt
    libdbi-perl \                  who needs that? maybe add to apt_docker_only.txt
    libdbd-mysql-perl \            who needs that? maybe add to apt_docker_only.txt
    libxml-simple-perl \           who needs that? maybe add to apt_docker_only.txt
    libmysqlclient-dev \           already in apt_cran.txt
    default-libmysqlclient-dev \   not needed (redundant with libmysqlclient-dev)
    libgdal-dev \                  already in apt_cran.txt
    ## new libs
        libglpk-dev \                  already in apt_cran.txt
        libeigen3-dev \                already in apt_bioc.txt
    ## Databases and other software
    sqlite \                       not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt
    openmpi-bin \                  only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
    mpi-default-bin \              only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
    openmpi-common \               only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
    openmpi-doc \                  only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
    tcl8.6-dev \                   we use tcl-dev (apt_optional_compile_R.txt) on the build machines
    tk-dev \                       already in apt_optional_compile_R.txt
    default-jdk \                  already in apt_optional_compile_R.txt
    imagemagick \                  not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt
    tabix \                        not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt
    ggobi \                        who needs that? maybe add to apt_docker_only.txt
    graphviz \                     already in apt_bioc.txt
    protobuf-compiler \            already in apt_cran.txt
    jags \                         already in apt_cran.txt
    ## Additional resources
    xfonts-100dpi \                already in apt_extra_fonts.txt
    xfonts-75dpi \                 already in apt_extra_fonts.txt
    biber \                        AFAIK this is only needed to build some vignettes so we have
                                       it listed in apt_vignettes_reference_manuals.txt and that's
                                       a list that we do not want to install on the Docker image
        libsbml5-dev \                 already in apt_bioc.txt
        libzmq3-dev \                  who needs that? maybe add to apt_docker_only.txt

## FIXME
## These two libraries don't install in the above section--WHY?
RUN apt-get update \
    && apt-get -y --no-install-recommends install \
    libmariadb-dev-compat \        not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt if you really want this on the Docker
                                       image
    libjpeg-dev \                  already in apt_optional_compile_R.txt
    libjpeg-turbo8-dev \           installing libjpeg-dev should be enough
    libjpeg8-dev \                 installing libjpeg-dev should be enough

Note that I've left the following section from Dockerfile out of the discussion for now:

## Python installations
RUN apt-get update \
    && apt-get install -y software-properties-common \
    && add-apt-repository universe \
    && apt-get update \
    && apt-get -y --no-install-recommends install python2 python-dev \
    && curl https://bootstrap.pypa.io/pip/2.7/get-pip.py --output get-pip.py \
    && python2 get-pip.py \
    && pip2 install wheel \
    ## Install sklearn and pandas on python
    && pip2 install sklearn \
    pandas \
    pyyaml \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
    && rm -rf get-pip.py

because I'm not sure what to do with it or why it is needed. We do need some Python modules on the build machines but they should all be installed for Python 3, not Python 2 (we've dropped support for Python 2 last year).

The goal is that in the future we'll only need to add new deb packages to the apt_cran.txt and/or apt_bioc.txt lists as new (or existing) Bioconductor packages introduce new system requirements. This will impact what gets installed on both, the build machines and the Docker image.

Hope this helps, H.

jwokaty commented 3 years ago

So in docker, we should be installing apt dependencies from the following files:

We're not installing apt_nice_to_have and apt_vignettes_reference_manuals--is that correct?

I also want to clarify that when one of these files has packages not listed in the current Dockerfile in the master branch, that we still install everything in the file. For example, the dockerfile on the master branch lists only 2 font packages; however, apt_extra_fonts has 8 total packages. So we will still be installing more packages than the current docker on the master branch, but at least what we're installing is explicit.

Additionally, when the docker and build systems have a similar package, we should choose the build system package that's in one of the files listed, correct?

nturaga commented 3 years ago

Hi Jennifer,

These are all very good questions. I will try to answer the ones which I’m able to,

So in docker, we should be installing apt dependencies from the following files:

• apt_bioc.txt • apt_cran.txt • apt_optional_compile_R • apt_required_compile_R • apt_extra_fonts • apt_required_compile_R • apt_required_build_R We're not installing apt_nice_to_have and apt_vignettes_reference_manuals--is that correct?

I also want to clarify that when one of these files has packages not listed in the current Dockerfile in the master branch, that we still install everything in the file. For example, the dockerfile on the master branch lists only 2 font packages; however, apt_extra_fonts has 8 total packages. So we will still be installing more packages than the current docker on the master branch, but at least what we're installing is explicit.

Somehow my experience so far has been that these apt_extra_fonts are needed only to ‘build’ vignettes. Is there a way to narrow down, what exactly these 6 extra fonts are needed for?

One thing to remember is that we are inheriting some dependencies from our parent docker image from rocker.

So the inheritance goes like this,

   ```
   ubuntu/latest —> rocker/r-ver —> rocker/rstudio —> bioconductor/bioconductor_docker`

There are dependencies which are already pre-installed and inherited from

rocker/r-ver - ( https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/install_R.sh)

rocker/rstudio - (https://github.com/rocker-org/rocker-versioned2/blob/master/scripts/install_pandoc.sh)

And as Martin, pointed out previously, because of the AUFS file system used as ‘layers’ in docker, each additional installation of the same software gives the impression of overwriting, but we are simply adding layers and increasing size.

Additionally, when the docker and build systems have a similar package, we should choose the build system package that's in one of the files listed, correct?

This should be correct, except changes could be potentially made once we complete testing. To elaborate on testing, we’ll be building / installing all the 2000+ bioconductor packages and their dependencies and we’ll see how that goes.

Specifically, something like this, where ‘pkg’ is a vector of Bioconductor packages.

  BiocManager::install(pkg,
                       INSTALL_opts = "--build",
                       update = FALSE,
                       quiet = TRUE,
                       force = TRUE,
                       keep_outputs = TRUE)

I’m not sure if this answers your questions, but I’m happy to get on a call and discuss the solution some more.

jwokaty commented 3 years ago

@nturaga Thanks for trying to answer some of my questions as well as pointing me to the rocker scripts.

I'm not sure if there's a better way to investigate these dependencies, but I decided to use code.bioconductor.org to investigate the "who needs that" packages. I will look there too for these files, but they're probably dependencies from nonbioconductor packages.

If these are for building vignettes and we don't build vignettes with docker, why are we installing them? The fonts are just one group of files where I know there are more in the BBS than in docker. I suspect there will be others.

jwokaty commented 3 years ago

I marked all the packages that are in both docker and the BBS:

apt_bioc.txt:graphviz                    # for Rgraphviz             # Bioconductor Docker
apt_bioc.txt:libgtkmm-2.4-dev            # for HilbertVisGUI         # Bioconductor Docker
apt_bioc.txt:libgsl-dev                  # for GSL                   # Bioconductor Docker
apt_bioc.txt:libsbml5-dev                # for rsbml                 # Bioconductor Docker
apt_bioc.txt:automake                    # for RProtoBufLib          # Bioconductor Docker
apt_bioc.txt:libnetcdf-dev               # for mzR, RNetCDF          # Bioconductor Docker
apt_bioc.txt:libopenbabel-dev            # for ChemmineOB            # Bioconductor Docker
apt_bioc.txt:libeigen3-dev               # for ChemmineOB            # Bioconductor Docker
apt_cran.txt:libglu1-mesa-dev        # for rgl                       # Bioconductor Docker
apt_cran.txt:libgmp-dev              # for gmp                       # Bioconductor Docker
apt_cran.txt:libsasl2-dev            # for mongolite                 # Bioconductor Docker
apt_cran.txt:libxml2-dev             # for XML                       # Bioconductor Docker
apt_cran.txt:libcurl4-openssl-dev    # for RCurl, curl               # Bioconductor Docker
apt_cran.txt:mpi-default-dev         # for Rmpi                      # Bioconductor Docker
apt_cran.txt:libudunits2-dev         # for units                     # Bioconductor Docker
apt_cran.txt:libv8-dev               # for V8                        # Bioconductor Docker
apt_cran.txt:libmpfr-dev             # for Rmpfr                     # Bioconductor Docker
apt_cran.txt:libfftw3-dev            # for fftw, fftwtools           # Bioconductor Docker
apt_cran.txt:libmysqlclient-dev      # for RMySQL                    # Bioconductor Docker
apt_cran.txt:libpq-dev               # for RPostgreSQL, RPostgres    # Bioconductor Docker
apt_cran.txt:libmagick++-dev         # for magick                    # Bioconductor Docker
apt_cran.txt:libgeos-dev             # for rgeos                     # Bioconductor Docker
apt_cran.txt:libproj-dev             # for proj4                     # Bioconductor Docker
apt_cran.txt:libgdal-dev             # for sf                        # Bioconductor Docker
apt_cran.txt:libpoppler-cpp-dev      # for pdftools                  # Bioconductor Docker
apt_cran.txt:libgtk2.0-dev           # for RGtk2                     # Bioconductor Docker
apt_cran.txt:libgit2-dev             # for gert                      # Bioconductor Docker
apt_cran.txt:jags                    # for rjags                     # Bioconductor Docker
apt_cran.txt:libprotobuf-dev         # for protolite                 # Bioconductor Docker 
apt_cran.txt:protobuf-compiler       # for protolite                 # Bioconductor Docker
apt_cran.txt:libglpk-dev             # for glpkAPI and to compile igraph with GLPK support   # Bioconductor Docker
apt_extra_fonts.txt:xfonts-100dpi                                    # Bioconductor Docker
apt_extra_fonts.txt:xfonts-75dpi                                     # Bioconductor Docker
apt_nice_to_have.txt:gdb                                             # Bioconductor Docker        (suggested add from above)
apt_nice_to_have.txt:pkg-config                                      # Bioconductor Docker       (suggested add from above)
apt_optional_compile_R.txt:libpng-dev                                # Bioconductor Docker
apt_optional_compile_R.txt:libjpeg-dev                               # Bioconductor Docker
apt_optional_compile_R.txt:libtiff-dev                               # Bioconductor Docker
apt_optional_compile_R.txt:libcairo2-dev                             # Bioconductor Docker
apt_optional_compile_R.txt:tcl-dev                                   # Bioconductor Docker
apt_optional_compile_R.txt:tk-dev                                    # Bioconductor Docker
apt_optional_compile_R.txt:default-jdk                               # Bioconductor Docker
apt_required_build.txt:python3-pip                                   # Bioconductor Docker
apt_required_compile_R.txt:gfortran                                  # Bioconductor Docker
apt_required_compile_R.txt:libreadline-dev                           # Bioconductor Docker
apt_required_compile_R.txt:libxt-dev                                 # Bioconductor Docker
apt_required_compile_R.txt:libbz2-dev                                # Bioconductor Docker
apt_required_compile_R.txt:liblzma-dev                               # Bioconductor Docker
apt_required_compile_R.txt:libpcre2-dev                              # Bioconductor Docker
apt_required_compile_R.txt:libcurl4-openssl-dev                      # Bioconductor Docker
apt_required_compile_R.txt:libncurses-dev                            # Bioconductor Docker         (suggested add from above)

Here's what's only in the BBS:

apt_bioc.txt:firefox                     # for packages using utils::browseURL()
apt_bioc.txt:libgraphviz-dev             # for Rgraphviz
apt_bioc.txt:clustalo                    # for LowMACA
apt_bioc.txt:ocl-icd-opencl-dev          # for gpuMagic
apt_bioc.txt:libavfilter-dev             # for av/spacialHeatmap
apt_bioc.txt:libfribidi-dev              # for EnhancedVolcano
apt_bioc.txt:infernal                    # for inferrnal
apt_bioc.txt:fuse                        # for Travel
apt_bioc.txt:libfuse-dev                 # for Travel
apt_bioc.txt:kallisto                    # for rkal
apt_bioc.txt:mono-runtime                # for rawr
apt_bioc.txt:libmono-system-data4.0-cil  # for rawr
apt_cran.txt:librsvg2-dev            # for rsvg
apt_cran.txt:libssl-dev              # for openssl, mongolite
apt_extra_fonts.txt:# APT packages for extra fonts
apt_extra_fonts.txt:gsfonts-x11
apt_extra_fonts.txt:xfonts-base
apt_extra_fonts.txt:xfonts-scalable
apt_extra_fonts.txt:t1-xfree86-nonfree
apt_extra_fonts.txt:ttf-xfree86-nonfree
apt_extra_fonts.txt:ttf-xfree86-nonfree-syriac
apt_nice_to_have.txt:tree
apt_nice_to_have.txt:manpages-dev    # man pages for C standard library
apt_nice_to_have.txt:mlocate         # Provides the locate command
apt_optional_compile_R.txt:gobjc
apt_optional_compile_R.txt:libicu-dev
apt_required_build.txt:python3-minimal
apt_required_build.txt:git
apt_required_compile_R.txt:build-essential
apt_required_compile_R.txt:libx11-dev
apt_required_compile_R.txt:zlib1g-dev
apt_vignettes_reference_manuals.txt:texlive
apt_vignettes_reference_manuals.txt:texlive-font-utils          # for epstopdf
apt_vignettes_reference_manuals.txt:texlive-pstricks            # provides pstricks.sty
apt_vignettes_reference_manuals.txt:texlive-latex-extra         # provides fullpage.sty
apt_vignettes_reference_manuals.txt:texlive-fonts-extra         # provides incosolata.sty
apt_vignettes_reference_manuals.txt:texlive-bibtex-extra        # provides unsrturl.bst
apt_vignettes_reference_manuals.txt:texlive-science             # provides algorithm.sty
apt_vignettes_reference_manuals.txt:texlive-luatex              # provides luatex85.sty
apt_vignettes_reference_manuals.txt:texlive-lang-european       # provides language definition files e.g. swedish.ldf
apt_vignettes_reference_manuals.txt:texi2html
apt_vignettes_reference_manuals.txt:texinfo
apt_vignettes_reference_manuals.txt:pandoc                      # needed for CRAN package knitr
apt_vignettes_reference_manuals.txt:pandoc-citeproc             # needed for CRAN package knitr
apt_vignettes_reference_manuals.txt:biber
apt_vignettes_reference_manuals.txt:#ttf-mscorefonts-installer

So while it's fine that we don't include the vignette packages, we see there's still other packages in other files that we do want to include that have additional packages. To complicate matters more, we have dependencies installed from rocker that we don't need to install again because we don't replace the original package, they just add another layer (note: we're usually installing a -dev version):

gfortran
libbz2-*
libcurl4
libicu*
libpcre2*
libjpeg-turbo*
libreadline
libtiff*
liblzma*
zlib1g

It seems that if we want to install apt_bioc.txt, apt_cran.txt, apt_optional_compile_R, apt_required_compile_R, apt_extra_fonts (do we still need this for the docker?), apt_required_compile_R, and apt_required_build_R, we should expect that the docker is going to be larger because of the extra packages. I think the current method where I exclude packages will give better control; it's just that what's installed needs to become explicit. I also think we should keep the practice of annotating any package dependencies.

I was not able to find any reference for the following packages listed in the Dockerfile when searching code.bioconductor.org, with the exception of the first package. These are the candidates for the apt_docker_only.txt suggested by @hpages . But I think we should remove them if not needed. We could do a test comparing what can be installed with the current docker image and an image without the packages below.

libz-dev                            # ceTF, proBatch
byacc
libhdf5-serial-dev
libgslcblas0
libhdf5-dev
libxpm-dev
libmodule-build-perl
libprotoc-dev
librdf0-dev
libarchive-extract-perl
libfile-copy-recursive-perl
libcgi-pm-perl
libdbi-perl
libdbd-mysql-perl
libxml-simple-perl
sqlite                          # Not needed
openmpi-bin
mpi-default-bin
openmpi-common
openmpi-doc
tabix
imagemagick
ggobi
libzmq3-dev
jwokaty commented 3 years ago
# Current devel as of July 23?
 [1] "affyPara"       "canceR"         "CelliD"         "cellity"       
 [5] "CompGO"         "ctgGEM"         "CytoTree"       "fgga"          
 [9] "GateFinder"     "gCrisprTools"   "gpuMagic"       "immunotation"  
[13] "lisaClust"      "methyAnalysis"  "phemd"          "rawrr"         
[17] "SCATE"          "scClassifR"     "schex"          "scTensor"      
[21] "scTGIF"         "SeqSQC"         "spatialHeatmap" "spicyR"        
[25] "SwimR"          "Travel"         "vissE"          "waddR"

# Building without some packages                     
 [1] "CAMERA"         "canceR"         "cellity"        "cicero"        
 [5] "cliqueMS"       "cosmiq"         "ctgGEM"         "CytoTree"      
 [9] "flagme"         "GateFinder"     "gpuMagic"       "igvR"          
[13] "immunotation"   "IPO"            "LOBSTAHS"       "MAIT"          
[17] "meshes"         "meshr"  %        "Metab"          "metaMS"        
[21] "methylscaper"   "monocle"        "ncGTW"          "phemd"         
[25] "proFIA"         "rawrr"          "RCyjs"          "Risa"          
[29] "scTensor"       "SeqSQC"         "spatialHeatmap" "tenXplore"     
[33] "Travel"         "uSORT"          "xcms"          

Not sure why these are different as this is not what I expected!

The following packages were selected because either we weren't sure what bioc packages required them or they appeared to be already satisfied by something in rocker. After building with these commented out, I manually reinstalled them one by one to see if they allowed additional bioc packages to be installed. Only libzmq3-dev allowed RCy3 to be installed.

Appeared to be already installed

byacc
fortran77-compiler
imagemagick
libgmp3-dev
libtiff5-dev
sqlite
ggobi
libarchive-extract-perl
libcgi-pm-perl
libdbd-mysql-perl
libfile-copy-recursive-perl
libgsl0-dev
libhdf5-serial-dev
liblapack-dev
libmariadb-dev-compat
libmodule-build-perl
librdf0-dev
libxml-simple-perl
libxpm-dev
mpi-default-bin
openmpi-doc
tabix

Not needed / No additional Bioc packages were install after the following apt packages were installed

default-libmysqlclient-dev
libgl1-mesa-dev
libdbi-perl
libprotoc-dev
libz-dev
jwokaty commented 3 years ago

I've tried to address all previous comments. However, I'm not attempting to recreate what is in the master branch nor the BBS, but a container that installs packages from the BBS and can override entries that we don't want to install (for example, when they've already been installed via Rocker).

The current size is 4.25GB. All but the following Bioconductor packages are installed:

 [1] "ArrayExpressHTS" "brainflowprobes" "BridgeDbR"       "canceR" 
 [5] "CHRONOS"         "cn.mops"         "CNVfilteR"       "CNViz"     
 [9] "CopyNumberPlots" "CytoTree"        "DaMiRseq"        "debCAM"
 [13] "DeepPINCS"       "derfinder"       "derfinderPlot"   "esATAC"   
 [17] "gaggle"          "GARS"            "IsoGeneGUI"      "miRSM" 
 [21] "MSGFgui"         "MSGFplus"        "panelcn.mops"    "paxtoolsr"
 [25] "phemd"           "psichomics"      "Rcpi"            "recount"
 [29] "regionReport"    "ReQON"           "RGMQL"           "RMassBank"  
 [33] "rmelting"        "RmiR"            "RNAAgeCalc"      "sarks"
 [37] "SELEX"           "SICtools"        "VAExprs"

Here's what we actually install in the Docker. You see these in the output when container is built. You can comment out the clean up at the bottom of src/install.sh and view the install_apt_pkgs and install_pip_pkgs files to see these packages.

# APT
automake
clustalo
firefox
fuse
graphviz
infernal
jags
kallisto
libavfilter-dev
libcurl4-openssl-dev
libeigen3-dev
libfftw3-dev
libfribidi-dev
libfuse-dev
libgdal-dev
libgeos-dev
libgit2-dev
libglpk-dev
libglu1-mesa-dev
libgmp-dev
libgraphviz-dev
libgsl-dev
libgtk2.0-dev
libgtkmm-2.4-dev
libmagick++-dev
libmono-system-data4.0-cil
libmpfr-dev
libmysqlclient-dev
libnetcdf-dev
libopenbabel-dev
libpoppler-cpp-dev
libpq-dev
libproj-dev
libprotobuf-dev
librsvg2-dev
libsasl2-dev
libsbml5-dev
libssl-dev
libudunits2-dev
libv8-dev
libxml2-dev
mono-runtime
mpi-default-dev
ocl-icd-opencl-dev
protobuf-compiler
python3-pip
tcl-dev
tk-dev
curl
libzmq3-dev
python3-pip

# PIP
h5py
h5pyd
jupyter
matplotlib
mofapy
mofapy2
nbconvert
numpy
phate
scipy
tensorflow_probability
testresources
virtualenv

If this is still too big, I need to know where to cut. I could start removing dependencies required for only a few BioC packages.

If we want to be more explicit, I can write a script to generate the above packages and we can rerun the script every time we want to update the docker image. We can also commit the list of packages.

I still kept a list of packages to skip because the BBS files have quite a few packages that were already installed via Rocker.

nturaga commented 3 years ago

Thanks @jwokaty, I will review this today/tomorrow.