Closed lgeistlinger closed 6 years ago
Hi @lgeistlinger
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: GSEABenchmarkeR
Type: Package
Title: Reproducible GSEA Benchmarking
Version: 0.99.0
Author:
Ludwig Geistlinger [aut, cre],
Gergely Csaba [aut],
Mara Santarelli [ctb],
Lucas Schiffer [ctb],
Marcel Ramos [ctb],
Levi Waldron [ctb],
Ralf Zimmer [aut]
Maintainer: Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>
Description:
The GSEABenchmarkeR package implements an extendable framework for
reproducible evaluation of set- and network-based methods for enrichment
analysis of gene expression data. This includes support for the efficient
execution of these methods on comprehensive real data compendia (microarray
and RNA-seq) using parallel computation on standard workstations and
institutional computer grids. Methods can then be assessed with respect to
runtime, statistical significance, and relevance of the results for the
phenotypes investigated.
URL: https://github.com/waldronlab/GSEABenchmarkeR
BugReports: https://github.com/waldronlab/GSEABenchmarkeR/issues
License: Artistic-2.0
Encoding: UTF-8
LazyData: true
Depends:
R(>= 3.5.0),
Biobase,
SummarizedExperiment
Imports:
AnnotationDbi,
AnnotationHub,
BiocParallel,
edgeR,
EnrichmentBrowser,
ExperimentHub,
GEOquery,
grDevices,
graphics,
KEGGandMetacoreDzPathwaysGEO,
KEGGdzPathwaysGEO,
methods,
rappdirs,
S4Vectors,
stats,
utils
Suggests:
BiocStyle,
BiocCheck,
knitr,
rmarkdown,
testthat
biocViews:
Microarray,
RNASeq,
GeneExpression,
DifferentialExpression,
Pathways,
GraphAndNetwork,
Network,
GeneSetEnrichment,
NetworkEnrichment,
Visualization,
ReportWriting
VignetteBuilder: knitr
RoxygenNote: 6.0.1
Consider adding SSH keys to your GitHub account. SSH keys will are used to control access to accepted Bioconductor packages. See these instructions to add SSH keys to your GitHub account.
Your package has been approved for building. Your package is now submitted to our queue.
IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
f0f9899 reduce maPreprocApply Example
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
5b6bb46 adapt user_data_dir for windows
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
55d0438 adapting BiocParallelParam example for windows
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
I was surprised (not so pleasantly) that the API is based on a vector of IDs that key into a hidden environment within the package. The deApply()
function, confusingly named because it does not apply a user-provided function like all other apply functions, appears to return the results by side effect. Is there a reason to not simply pass values around? Could a MultiAssayExperiment represent a compendium?
@lawremi Thanks for the feedback!
I think of a compendium as a collection of datasets of the same assay type (e.g. RNA-seq), where samples are typically distinct between datasets.
So I guess a compendium is suitably represented by a list of datasets each being a SummarizedExperiment
, or one huge SummarizedExperiment
as e.g. done for the GSE62944
(TCGA RNA-seq) compendium.
I decided for the first option when creating the package in the first place, but I'm currently wondering whether the second option might be preferred?
Not sure whether using a hidden environment has been a good decision (apparently not), but I thereby wanted to ensure reproducibility of the benchmark routines as this
i. prevents users from intentionally or accidentally modifying the underlying compendium
ii. enables me to provide a defined and exclusive set of manipulation operations on the compendium
iii. does not pollute the user's workspace (.GlobalEnv
), into which I would have otherwise loaded the compendium
Why is it a big deal if they do modify the compendium? Does it break the package? The R/Bioc philosophy seems to be to give the user freedom, even if it means they might shoot themselves in the foot every once in a while. If you load the compendium as a single object, the user only needs a single reference to it from the global environment.
Agreed, my Apply
-functions (maPreprocApply
, deApply
, eaApply
, and evalApply
) might be confused with the canonical apply
functions.
However, the name came closest to what these functions do: they apply (via BiocParallel::bplapply
) a function of the indicated name to the compendium of datasets, e.g. a differential expression analysis in the case of deApply
.
Open for other naming suggestions.
Does the package even need to compute differential expression? Or could it take results as input? The focus is obviously on comparing enrichment results.
Generally starting from DE results was not an option as many enrichment methods use sample permutation (such as GSEA
, SAFE
, SAMGS
, ...). This means they recompute dif-
ferential expression in each permutation of the sample labels for gene set significance estimation and thus require the complete expression matrix.
Received a valid push; starting a build. Commits are:
1ff3344 resolve hidden environment
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "skipped, TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Received a valid push; starting a build. Commits are:
58458b2 adapt for windows settings
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "TIMEOUT, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
@lawremi The compendia are now represented as a list of SummarizedExperiment
s and are not locked anymore in a hidden environment, but are rather passed around by the functions of the package. Thanks a lot for the input!
@dvantwisk any remarks from your side?
Also: what's this postprocessing
error about? the building report does not provide any information here what the problem is ...
I'm also going to reduce time needed for the examples
further to not run into the timeout on the Mac building machine ... although I cannot reproduce this long checking time on my MacBook Air here ...
The post-processing error is occurring on our end, the issue should be corrected now and should be gone if you build with another version bump.
All I can recommend with regards to the timeout is to continue to try to reduce the run-time of check. There's nothing specific that I can give you on this, and unfortunately the different hardware that you and merida2 (our mac builder) is using to run the check will result in differing amounts of time needed to run it. The amount of time needed to run it must be less than ten minutes to pass without a timeout.
Received a valid push; starting a build. Commits are:
300dfc1 reduce examples
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
@dvantwisk:
FYI: the error thrown on MAC OS X by serialize
seems to be a general problem in R
.
I have contacted @mtmorgan on that and I am in good hope that this can be fixed soon in order to move on with the package.
I am currently reviewing the package. I'll try to have something back for you soon.
Your package is written very well. I don't have any knit-picks about the code or structure of the package, I only have a few suggestions. These are all things to consider, so message me back on whether you plan on doing any of these change or not.
GENERAL:
R/geo2kegg.R
.loadGEO2KEGG()
you check if some data is
already present on disk and downloading it if not. BiocFileCache
is a package
developed by the Bioconductor Core group that was made to perform this task.
Please consider using it. https://bioconductor.org/packages/release/bioc/html/BiocFileCache.htmlR/gse62944.R
BiocFileCache
package for this function.R/io.R
BiocFileCache
throughout this file.tests
testthat
package.Hi @dvantwisk
Thanks for the feedback!
BiocFileCache
is definitively on the TODO.
I'll also add some unit tests, but I had this removed because I was running into timeouts as it would exceed the R CMD check time limit.
Let me see what I can do about it.
Just messaging you back to check in on your progress. Are you still working on the changes to BiocFileCache
or do you need any help? Again, this change is only a suggestion and the package would be fine should you choose not to do it.
Hi @dvantwisk
Thanks for checking!
Unfortunately a couple of other things came in the way, but I'm almost ready with supporting BiocFileCache
in the package and I am expecting to be able to push the new version within this week.
Received a valid push; starting a build. Commits are:
8285ac0 supporting BiocFileCache
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
Hi,
I just pushed a new version of the package addressing several issues.
@dvantwisk I found caching particularly useful in the case of extensive preprocessing of the benchmarking compendia. Support to flexibly save and restore an already processed compendium based on BiocFileCache
is now integrated in the loadEData
function and documented in Section 6.1 of the vignette.
@lawremi To avoid confusion, all *Apply
-functions of the package have been renamed. In particular:
deApply -> runDE
eaApply -> runEA
evalApply -> evalRelevance
maPreprocApply -> maPreproc
They now extend naturally from a single SummarizedExperiment
to a list
of SummarizedExperiment
s.
@mtmorgan Not sure whether there are any new insights concerning the serialize
issue under Mac (http://bioconductor.org/spb_reports/GSEABenchmarkeR_buildreport_20180222155936.html#merida2_check_anchor), but I'm sure that there are currently quite some other urgent things due to the upcoming release.
I am thus currently approaching the issue like this (if not suggested otherwise):
#
# iterate over expression datasets
#
.iter <- function(exp.list, f, ..., parallel=NULL)
{
is.windows <- is.null(parallel) && .Platform$OS.type == "windows"
if(is.windows) parallel <- BiocParallel::SerialParam()
if(is.null(parallel)) parallel <- BiocParallel::bpparam()
fail <- TRUE
trials <- 0
while(fail && trials < 5)
{
res <- try(
BiocParallel::bplapply(exp.list, f, ..., BPPARAM=parallel),
silent=TRUE)
fail <- is(res, "try-error")
trials <- trials + 1
}
if(fail) stop(res)
return(res)
}
I found the serialize
error to be thrown roughly in one out of three trials, so I typically make it through the check like this on my Mac and apparently also on the build system. Also, I never encounter this error under Linux.
The changes related to BiocFileCache
look great. I'm not sure if anyone you've @'d has anything else to add. If you are finished making changes, just message me and I'll move to accept the package.
Hi @dvantwisk I think we can go ahead + thanks a lot for taking the time to review the package!
Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be posed to this issue in the next several days.
Thank you for contributing to Bioconductor!
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/lgeistlinger.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using biocLite(\"GSEABenchmarkeR\")
. The package 'landing page' will be created at
https://bioconductor.org/packages/GSEABenchmarkeR
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[ x ] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[ x ] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[ x ] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[ x ] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[ x ] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.