Closed aitap closed 9 months ago
Seventynone is a lot. That increases what we do for BioConductor by almost 20%, and this is manual for manaul.
Is there a change you can break it down by package group?
(And yes, nobody gives us anything precompiled so BioConductor is always from source.)
Another thing you could do is to look at
suppressMessages({
library(data.table)
})
## get most used BioC packages from https://bioconductor.org/packages/stats/bioc/bioc_pkg_scores.tab
S <- fread("https://bioconductor.org/packages/stats/bioc/bioc_pkg_scores.tab", showProgress=FALSE)
setnames(S, c("package", "score"))
S[, lcpkg := tolower(package)]
S <- S[order(-score), .SD[1,], by="lcpkg"]
the BioConductor score. Maybe we can add the most important of these packages (and its dependency tail) first, then the next and so on.
I definitely wouldn't want to increase your manual load by 20%. These are not the most popular packages, but they are in the top 90% by score:
r2u <- ... # list of 'all', 'amd64' packages in the repository for Jammy
pk <- c(
"derfinderHelper", "EBarrays", "RTCGA", "sesameData", "GenomicScores",
"derfinder", "splots", "NanoStringNCTools", "org.Rn.eg.db", "biocthis",
"geNetClassifier", "gDRtestData", "RTCGA.miRNASeq", "CytoML",
"BioNet", "rols", "pRolocdata", "sesame", "RTCGA.mutations",
"JASPAR2020", "LOLA", "msdata", "CAMERA", "RMassBankData", "Homo.sapiens",
"PFAM.db", "ALLMLL", "gDRutils", "RTCGA.CNV", "JASPAR2018", "drawProteins",
"hgu133plus2.db", "RBioFormats", "grasp2db", "phastCons100way.UCSC.hg38",
"RTCGA.rnaseq", "simpIntLists", "rpx", "microRNA", "gDRstyle",
"MotifDb", "wiggleplotr", "hgu95av2.db", "JASPAR2014", "KOdata",
"faahKO", "flowWorkspaceData", "DAPARdata", "RTCGA.RPPA", "recount",
"RTCGA.methylation", "MafH5.gnomAD.v3.1.2.GRCh38", "pRoloc",
"humanStemCell", "cellHTS2", "org.Bt.eg.db", "SPIA", "RTCGA.mRNA",
"RTCGA.clinical", "human.db0", "rae230aprobe", "org.Sc.sgd.db",
"GeomxTools", "lydata", "ChemmineOB", "MafDb.1Kgenomes.phase1.hs37d5",
"rsbml", "biodb", "ReportingTools", "gwascat", "rae230a.db",
"humanCHRLOC", "hgu133a.db", "HubPub", "pasilla", "DAPAR", "SNPlocs.Hsapiens.dbSNP144.GRCh37",
"JASPAR2016", "CCl4"
)
S <- lapply(list(
fread("https://bioconductor.org/packages/stats/bioc/bioc_pkg_scores.tab", showProgress=FALSE),
fread("https://bioconductor.org/packages/stats/data-annotation/annotation_pkg_scores.tab", showProgress=FALSE),
fread("https://bioconductor.org/packages/stats/data-experiment/experiment_pkg_scores.tab", showProgress=FALSE)
), function(S) {
setnames(S, c("package", "score"))
S[, lcpkg := tolower(package)]
S <- S[order(-score), .SD[1,], by="lcpkg"]
pkS <- S[package %in% pk]
pkS$CDF <- ecdf(S$score)(pkS$score)
deps <- tools::package_dependencies(
pkS$package, which = 'strong', recursive = TRUE
)
pkS$strongdeps <- lengths(deps)
pkS$not.in.r2u <- lengths(lapply(deps, function (deps)
setdiff(
tolower(deps),
c(gsub('^r-(cran|bioc)-', '', r2u), tolower(tools:::.get_standard_package_names()$base))
)
))
pkS
})
lapply(S, head)
[[1]]
lcpkg package score CDF strongdeps not.in.r2u
1: reportingtools ReportingTools 694 0.9125602 188 1
2: motifdb MotifDb 576 0.9058911 50 0
3: sesame sesame 560 0.9047795 143 1
4: spia SPIA 557 0.9044090 14 0
5: rtcga RTCGA 528 0.9021860 138 0
6: gwascat gwascat 520 0.9010745 137 0
[[2]]
lcpkg package score CDF strongdeps not.in.r2u
1: hgu133plus2.db hgu133plus2.db 1537 0.9954666 47 0
2: org.rn.eg.db org.Rn.eg.db 1392 0.9950888 46 0
3: homo.sapiens Homo.sapiens 980 0.9913109 105 0
4: jaspar2020 JASPAR2020 923 0.9909331 1 0
5: pfam.db PFAM.db 756 0.9901776 46 0
6: hgu133a.db hgu133a.db 737 0.9890442 47 0
[[3]]
lcpkg package score CDF strongdeps not.in.r2u
1: pasilla pasilla 879 0.9857143 117 0
2: sesamedata sesameData 647 0.9816327 104 0
3: msdata msdata 482 0.9653061 0 0
4: rtcga.clinical RTCGA.clinical 302 0.9489796 139 1
5: faahko faahKO 249 0.9387755 98 0
6: prolocdata pRolocdata 210 0.9285714 76 0
So if I had to pick one package, it could be the 3-megabyte experiment package pasilla
, or MotifDb
which doesn't have extra dependencies.
Doing it throttled may work well when I have an idle (evening) moment to take a look. Both noted.
I just added pasilla
plus (working down the "score" list) DSS
, DMRcate
, MungeSumstats
.
I may get to MotifDb
next time. BioCondutor count now at 396, so 400 looms....
I added a handful more, following the BioConductor score from the top down til it got to MotifDb
. As it involded two more dependencies, the counts is now at 401 BioC packages.
Just added five more from BioConductor, and will try to chip away at this slowly.
That said, I will also close this now as it wasn't so much an 'issue' as a bit of misunderstanding about scope, process and how the sauce is made here. Please feel free to reopen if you think there is something here I missed.
Thank you very much for all your packaging work! These binaries will save everyone a lot of time. I think I now owe you a favour :) The revdepcheck virtual machine upgrades without a hitch.
(Revisiting this thread, on the other hand, is somewhat embarrassing.)
Glad to hear it is of help, and yes, these things tend to just magically work once you have the magic sauce sorted out.
I am still going down the 'karma list' so now we are at 'top 260' minus the that currently does not build on Linux for BioC 3.18 (as I learned on the BioC slack) and BiocInstaller which I skipped on purpose (fearing it may cross wires).
Hi!
While installing a large number of packages from both CRAN and Bioconductor on a container running the
docker.io/rocker/r2u
image, I got some of the Bioconductor dependencies from source. I can start a fresh copy of the container and run:It will say
Install system packages as root...
twice, install a number of*.deb
packages on the second run and then install the remaining packages from source (some of these are large and may benefit from an increased download timeout). I plucked the source package names from the console bygrep
ping for^trying URL
. I really appreciate installing only these 79 packages from source instead of all the 1316 dependencies, and I would be grateful if you package these ones too. Please let me know if I can help!Speaking of system dependencies,
CytoML
wantslibxml2
(and will fail to compile withoutlibxml/tree.h
)ChemmineOB
wantslibopenbabel
andEigen
(and will fail to compile withoutopenbabel/obutil.h
andEigen/Core
)rsbml
wantslibsbml
(and will fail to configure without a corresponding.pc
file)