easybuilders / easybuild-easyconfigs

A collection of easyconfig files that describe which software to build using which build options with EasyBuild.
https://easybuild.io
GNU General Public License v2.0
374 stars 700 forks source link

Bioconductor 3.2 exts_list Packages #2465

Open nathanhaigh opened 8 years ago

nathanhaigh commented 8 years ago

I'm looking at creating a new easyconfig for R-bundle-Bioconductor-3.2 and was wondering how best to populate/update the packages and versions specified in exts_list. Should this be installing all packages for bioconductor?

boegel commented 8 years ago

@nathanhaigh: we typically bump the versions of all extensions that were listed in the most recent previous version

The list of extensions typically grows as the need arises (since reinstalling only missing extensions is easy, cfr. http://easybuild.readthedocs.org/en/latest/Partial_installations.html#partial-installation-skip).

How large is the list of all Bioconductor extensions? If it's doable, we can include all.

Also, take a look at #1962, @verdurin has already done some work on Bioconductor 3.2, but it never got merged (and the PR is kind of broken now, it seems).

nathanhaigh commented 8 years ago

Could be over 1000 in it's entirety (https://www.bioconductor.org/packages/3.2/BiocViews.html#___Software). The core packages would be significantly less but I'm not sure how many at this stage.

The following should return all the package names for bioconductor:

source("https://bioconductor.org/biocLite.R")
all_group()
boegel commented 8 years ago

Well, are you up for maintaining an easyconfig file that lists ~1000 extensions? ;)

nathanhaigh commented 8 years ago

OK, so BioC have a file listing all their software packages in a Debian Control File.

I notice in the R-bundle-Bioconductor easyconfig files that it states order of packages is important. Is this because of package dependencies?

I really don't want to have to manually maintain the order of a list containing 1104 software packages! So, I'm wondering how we could use the dependencies specified in this file for automatically choosing the installation order but also identify CRAN dependencies.

As an aside, this DCF doesn't contain the 895 BioC AnnotationData packages including things like: GO.db and KEGG.db currently in the R-bundle-Bioconductor easyconfig file.

nathanhaigh commented 8 years ago

DCF's for the AnnotationData and ExperimentData packages are available here: https://bioconductor.org/packages/3.2/data/annotation/src/contrib/PACKAGES https://bioconductor.org/packages/3.2/data/experiment/src/contrib/PACKAGES

boegel commented 8 years ago

I think installing all Bioconductor pkgs is... nuts. :)

I had a need for updating to 3.2, so I did a quick-and-dirty bump of all packages that are included already with:

for name in `grepi 'bioconductor_options),' WIP/R-bundle-Bioconductor-3.2-foss-2016a-R-3.2.3.eb | sed "s/^[^']*'//g" | sed "s/'.*//g"`;
do
    version=`curl https://bioconductor.org/packages/3.2/bioc/html/${name}.html 2>/dev/null| grep -A1 Version | tail -1 | sed 's/.*<td>//g' | sed 's/<\/td>.*//g'`;
    echo "    ('$name', '$version', bioconductor_options),";
done

I'm likely going to have to deal with missing dependencies that were added in the updated versions, but at least this is a good start.

cfr. https://github.com/hpcugent/easybuild-easyconfigs/pull/2697

nathanhaigh commented 8 years ago

Can I ask a simple (stupid?) question: why can't biocLite() be used within R to do the install of BioC packages?

boegel commented 8 years ago

@nathanhaigh As far as I know, biocLite doesn't allow you to control versions (of either the packages themselves, or its dependencies), which goes against the spirit of EB where we (try to) version-fix everything (at least today).

verdurin commented 8 years ago

Just a note that one of our groups insists that they need all the BioC packages, primarily because they occasionally receive requests for some of the obscure ones and they'd prefer not to have to install those on the fly.

fgeorgatos commented 8 years ago

That group should take ownership of their problem, to appreciate what they are asking for ;)

On Sunday, 17 April 2016, Adam Huffman notifications@github.com wrote:

Just a note that one of our groups insists that they need all the BioC packages, primarily because they occasionally receive requests for some of the obscure ones and they'd prefer not to have to install those on the fly.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/hpcugent/easybuild-easyconfigs/issues/2465#issuecomment-210970903

echo "sysadmin know better bash than english"|sed s/min/mins/ \ | sed 's/better bash/bash better/' # signal detected in a CERN forum

verdurin commented 8 years ago

I would agree, but in fact they maintain it themselves now in their own private area, and we're trying to move people towards central infrastructure (three institutes are merging into one).

jepolitsch commented 4 months ago

I know this thread is 8 years old, but in case anyone needs to build a new bioconductor version like me with the updated package versions:

Step 1: Install and load the BiocManager package if not already installed
if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
library(BiocManager)

Step 2: Set the Bioconductor version to 3.19
BiocManager::install(version = "3.19")

Step 3: Retrieve the package information for Bioconductor 3.19
bioc_version <- "3.19"
bioc_repo <- BiocManager::repositories(version = bioc_version)
available_packages <- available.packages(repos = bioc_repo)

Step 4: Extract package names and versions
package_versions <- data.frame(
    Package = rownames(available_packages),
    Version = available_packages[, "Version"]
)

Step 5: Define a function to format each package entry
format_package_entry <- function(package, version) {
    sprintf("('%s', '%s', {\n        'checksums': [''],\n    }),", package, version)
}

Step 6: Apply the formatting function to each package entry
formatted_entries <- apply(package_versions, 1, function(row) {
    format_package_entry(row["Package"], row["Version"])
})

Step 7: Write the formatted entries to a file
output_file <- "bioconductor_3_19_package_versions.txt"
writeLines(formatted_entries, con = output_file)

Print the file path for confirmation
cat("Package versions have been saved to:", output_file, "\n") file path for confirmation cat("Package versions have been saved to:", output_file, "\n")

This thread is still one of the top results, so if there is a more EB way to do this (maybe preserving the order of exts) let me know.