Closed sneumann closed 5 years ago
These are the (to be) installed packages:
'ASEB', 'Cardinal', 'CausalR', 'CellNOptR', 'ChemmineOB', 'cisPath', 'clippda',
'CNORdt', 'CNORfeeder', 'CNORode', 'cydar', 'deltaGseg', 'DEP', 'DEqMS', 'diffcyt',
'DominoEffect', 'Doscheda', 'drawProteins', 'eiR', 'fCI', 'fmcsR', 'GraphPAC',
'HPAanalyze', 'IMMAN', 'InterMineR', 'iPAC', 'IPPD', 'kimod', 'LPEadj', 'mlm4omics',
'MSstatsQC', 'MSstatsQCgui', 'omicRexposome', 'PAA', 'Path2PPI', 'Pbase', 'PCpheno',
'PECA', 'pepXMLTab', 'PGA', 'pgca', 'phosphonormalizer', 'plgem', 'PLPE',
'PowerExplorer', 'ppiStats', 'procoil', 'ProCoNA', 'pRolocGUI', 'ProteomicsAnnotationHubData',
'Pviz', 'qcmetrics', 'qPLEXanalyzer', 'QuartPAC', 'rain', 'RCASPAR', 'Rchemcpp', 'Rcpi',
'readat', 'ROTS', 'RpsiXML', 'sapFinder', 'ScISI', 'shinyTANDEM', 'SLGI', 'SpacePAC',
'spliceSites', 'topdownr', 'TPP', 'XINA', 'CardinalWorkflows', 'faahKO', 'gcspikelite',
'iontreeData', 'metaMSdata', 'msPurityData', 'msqc1', 'MSstatsBioData', 'mtbls2',
'plasFIA', 'ProData', 'PtH2O2lipids', 'qPLEXdata', 'RMassBankData', 'topdownrdata'
I note quite a few Data packages. faahKO, RMassBankData, mtbls2, plasFIA ... I think some should be moved. Or maybe even get rid of the MassSpectrometryData
View in the containers altogether ?
I note a few packages that would also make sense in metabolomics: Cardinal, ChemmineOB, fmcsR, Pviz, Rchemcpp, Maybe we can move them towards protmetcore ? Or maybe some are dependencies of the Data packages, and gone if we remove that. Yours, Steffen
@lgatto could you have a look ?
I checked locally, and removing the MassSpectrometryData
saves around 3 minutes (10%) of build time, and image size comes down from ~11GB to ~9GB. I recommend to remove the data nevertheless.
Hi, I am (also) working on the timeout of devel_protmetocre2. For the timing optimisations to be a bit less manual work, I have the following set of scripts to determine what takes how much time.
First, you cut&paste from the beginning of the output from docker build
, which packages it wants to install. The build can be interrupted, it is not needed.
Then, inside the FROM
image, using `docker run --rm -it FROMIMAGE bash' with a tiny script to install a package specified on the command line,
and a loop across all these packages, I can gather all build times (including dependencies and all downloads).
installit.R:
#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
library(BiocManager)
BiocManager::install(args[1]
apt install time # for /usr/bin/time
# Install all packages
for F in 'bioassayR' 'BioNetStat' ... 'gridExtra' ; do /usr/bin/time --output $F.timing ./installit.R $F ; done
# Collect timing:
for F in *.timing ; do echo -n $F " " ; cat $F | grep -v output | cut -d " " -f 3 | cut -d e -f 1 ; done
Then the most time consuming can be shuffled to other Dockerfiles.
I am also preparing a way to use the download statistics to eventually automagically move packages with a low download number (from the BioC statistics) into an devel_metabolomics_extra2 package.
Yours, Steffen
So, the stats code would be:
pkgs_to_install <- c('BiocVersion', 'biocViews', 'ProtGenerics', 'mzR',
'MSnbase', 'msdata', 'BiocParallel', 'knitr', 'rmarkdown', 'httr', 'XML',
'zlibbioc')
yr <- format(Sys.time(), "%Y")
## http://bioconductor.org/packages/stats/bioc/xcms/xcms_2018_stats.tab
## http://bioconductor.org/packages/stats/data-experiment/msdata/msdata_2018_stats.tab
staturl <- "http://bioconductor.org/packages/stats/"
downloads <- t(sapply (pkgs_to_install, function(pkg) {
urls <- paste(staturl, c("bioc", "data-experiment"), "/",
pkg, "/", pkg, "_", yr, "_stats.tab", sep="")
pkgdownloads <- sapply(urls, function(url) {
stats.tab <- try(read.delim(url))
ifelse(class(stats.tab) == "try-error",
NA,
stats.tab[grep("all", stats.tab[,"Month"]), "Nb_of_distinct_IPs"])
}, USE.NAMES=FALSE)
pkgdownloads
}))
## Retain the packages with the topX
topX <- 0.75
popular <- sort(apply(downloads, MARGIN=1, FUN=function(x) max(x, na.rm=TRUE)),
decreasing = TRUE)
popular <- popular[seq(1, (length(popular)*topX))]
names(popular)
But I realised that you don't want that added to install.R during build time, because you don't want a package come and go when stats change the ordering. Instead, one can add that statically to devel and release based on last year's download stats. Yours, Steffen
This still appears to be an issue see the most recent log:
The release_proteomics has not built in over a year - The last success build was for R3.4.4 and Bioc3.6 - If this is not remedied before the next release we will remove it from the README and list of supported dockers. The devel has also not built for over an year.
There was a successful build yesterday: https://cloud.docker.com/u/bioconductor/repository/docker/bioconductor/devel_proteomics2/builds Yours, Steffen
The docker hub builds get cancelled after about two hours. Locally
devel_proteomics2
builds on a 3yr old workstatin in about36:36.04 elapsed
minutes. Yours, Steffen