Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

GENIE3 #321

Closed s-aibar closed 7 years ago

s-aibar commented 7 years ago

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 7 years ago

Hi @s-aibar

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: GENIE3
Type: Package
Title: GEne Network Inference with Ensemble of trees
Version: 0.99.0
Date: 2017-01-16
Author: Van Anh Huynh-Thu, Sara Aibar, Pierre Geurts
Maintainer: Van Anh Huynh-Thu <vahuynh@ulg.ac.be>
Description: This package implements the GENIE3 algorithm for inferring gene
    regulatory networks from expression data.
License: GPL (>= 2)
LazyData: TRUE
Imports: stats, reshape2, testthat
NeedsCompilation: yes
RoxygenNote: 5.0.1
Suggests: knitr, rmarkdown, foreach, doRNG, doParallel,
VignetteBuilder: knitr
biocViews: NetworkInference, SystemsBiology, DecisionTree, Regression,
        Network, GraphAndNetwork, GeneExpression
bioc-issue-bot commented 7 years ago

Your package has been approved for building. Your package is now submitted to our queue.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/GENIE3_buildreport_20170314162324.html

mtmorgan commented 7 years ago

@vobencha will review this package, but I note it has no Bioconductor dependencies. A key reason to contribute to Bioconductor is to facilitate interaction with other packages in the Bioconductor ecosystem through shared data structures. It would be natural to use SummarizedExperiment or Biobase::ExpressionSet for representing expression input data. One might also anticipate GSEABase for representing gene sets and / or the graph package for graph representation (I did not look at your package in detail so do not know how relevant these specific representations are.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

f90d81f Update DESCRIPTION

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/GENIE3_buildreport_20170315181353.html

vobencha commented 7 years ago

Before I do a review, please incorporate comments from https://github.com/Bioconductor/Contributions/issues/321#issuecomment-286548000 or explain why you think they don't apply to GENIE3.

Valerie

vobencha commented 7 years ago

Any updates? Should we close this issue? Valerie

s-aibar commented 7 years ago

Hello,

Sorry for not replying earlier, we are creating this package as part of a workflow. Some of the other packages submitted at the same time also require some changes to be included in Bioconductor, so I was considering whether some further changes need to be done before you review this one.

In principle, I think the previous comment does not apply to this package. GENIE3 package does not have any Bioconductor dependencies because its only input is an expression matrix (i.e. exprs() from an ExpressionSet, it doesnt need use the rest), and a list of gene identifiers (input as a character vector, matching the row names of the expression matrix). If you consider that it should accept a specific class as input, I can certainly modify the package, but a priory I think it would only make it more complex (e.g. there are multiple "containers" for expression matrices, SummarizedExperiment, ExpressionSet, SCESet, etc... are them all compatible?).

Regarding the networks: The package is meant to reconstruct gene-regulatory networks, but the network is returned as an incidence matrix or an edge list (data.frame containing the columns 'TF' and 'target'). We kept these formats for compatibility with previous versions implemented in Python and Matlab, and because it is easy to export/convert into any other format (e.g. export as text file, or convert into R graph objects with igraph::graph_from_data_frame()). We can consider exporting it to other formats if you feel it is appropriate, but at the moment they are not used within the package.

Thank you for taking the package into consideration, Sara Aibar

vobencha commented 7 years ago

The Bioconductor classes mentioned (SummarizeExperiment, ExpressionSet, etc.) all have a slot that contains a matrix of data; the classes are not identical but the matrix object is. The idea is that you'd write a generic with methods for the different classes - each method would extract the matrix from the Bioconductor class using the appropriate accessor function. Once the matrix is extracted the rest of the code would be the same for all methods.

You mentioned you're working on other aspects of a pipeline - maybe the functionality of GENIE3 should be folded into one of those - I don't think it requires a stand alone package. If you agree we can close this issue.

Valerie

s-aibar commented 7 years ago

Ok, we can easily accept those objects as input to GENIE3, and extract the matrix already within the function (although personally I find it safer if it is the user who decides which slot to use...).

Regarding the independence of GENIE3: we are using GENIE3 as part of an analysis pipeline, including AUCell (which was also submitted as a new package), and motif and GO enrichment. However, the other packages don't really have much to do with co-expression networks. For example, AUCell's aim is to evaluate the activity of gene signatures on single-cell RNA-seq... If needed, GENIE3 could be merged with some of these packages, but conceptually they are independent tools, they do different type of analyses and they can be applied on different types of data. In this way, it would be better to keep them as independent packages (for maintenance, reduce dependence across packages, easier to understand for the user...). Furthermore, GENIE3 is already known as an independent tool (winner of the DREAM4 and 5 challenges, also available in python and matlab at http://www.montefiore.ulg.ac.be/~huynh-thu/GENIE3.html), and it will be easier to find if the new R implementation can be kept as a package with its own name. Its current citation stats also support the idea that it is useful as an independent tool by itself. Whether it is worth to include it in Bioconductor or not, of course it is up to you to decide, but since GENIE3 aim is to infer gene regulatory networks, I certainly think it fits into the description of "tools for the analysis and comprehension of high-throughput genomic data", and therefore it fits better in Bioconductor than other -more generic- repositories...

Regards, Sara Aibar

vobencha commented 7 years ago

Hi Sara,

It's fine to keep GENIE3 as it's own package. I suggested breaking it up because I thought you were leaning that way - I must have misunderstood.

There isn't much point in having a package in Bioconductor that doesn't use the infrastructure (could just submit to CRAN) so you do need to support the core classes mentioned in these comments: https://github.com/Bioconductor/Contributions/issues/321#issuecomment-286548000 https://github.com/Bioconductor/Contributions/issues/321#issuecomment-297994406

You mention that you 'find it safer to let the user decided which slot to use'. The point of S4 generics and method dispatch is to provide a level of convenience to the user. For example, the user has an object that holds discrete data and wants to invoke a method for counting - instead of having to figure out where the data are, get the name of the extractor, extract the data, feed it to the method etc. they simply invoke the method on the class. There aren't multiple slots in an object (at least not the objects we suggested) that have the same type of data; the idea that the user would 'choose the slot' doesn't make sense because only one meets the criteria.

In https://github.com/Bioconductor/Contributions/issues/321#issuecomment-294945030 you said

"... GENIE3 package does not have any Bioconductor dependencies because its only input is an expression matrix (i.e. exprs() from an ExpressionSet, it doesnt need use the rest), and a list of gene identifiers (input as a character vector, matching the row names of the expression matrix). 
..."

The ExpressionSet object (as well as the others) have slots that contain the gene identifiers so in fact you could use 2 slots from the object. Your method could extract the matrix of data and the identifiers and pass them on to the main body of your function which would be reused by multiple methods. Does that make sense? I can explain more (with code example) if necessary.

Valerie

vobencha commented 7 years ago

Are you planning to submit a new version? Valerie

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

e8c3fb6 VERSION 0.99.2 ---------------------- INTERFACE CH... 6e188e7 VERSION 0.99.2 ---------------------- INTERFACE CH...

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/GENIE3_buildreport_20170602095811.html

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

d5a5a20 0.99.3

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "skipped, ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/GENIE3_buildreport_20170602104624.html

s-aibar commented 7 years ago

Dear Valerie,

I have just submitted a new version, which accepts an ExpressionSet or SummarizedExperiment as input.

However, I am not sure if the way I have used to extract the expression matrix is the best, as it does not make use of class inheritance, etc... Let me know if there is any standard/better approach: `

Biobase::ExpressionSet

if("ExpressionSet" %in% class(exprMatrix)) 
{
    exprMatrix <- Biobase::exprs(exprMatrix)
}

# SummarizedExperiment::RangedSummarizedExperiment
# Use SummarizedExperiment rather than the older ExpressionSet, especially for sequence data.
if(grepl("SummarizedExperiment", class(exprMatrix))) 
{
    usedSlot <- names(SummarizedExperiment::assays(exprMatrix))
    otherAvailable <- names(SummarizedExperiment::assays(exprMatrix))[-1]

    exprMatrix <- SummarizedExperiment::assay(exprMatrix)

    msg <- paste0("SummarizedExperiment slot extracted as expression matrix: \t", usedSlot) 
    if(length(otherAvailable)>0) msg <- paste(msg, "\nOther slots available:", paste(otherAvailable, collapse=", "))
    warning(msg)
}

`

Also, in the new version, the Windows building is giving an error (related to the C function). Do you know what can be wrong?

make: C:/Users/BIOCBU~1/BBS-3~1.6-B/R/etc/x64/Makeconf: Permission denied gcc -c -o GENIE3.o GENIE3.c GENIE3.c:19:15: fatal error: R.h: No such file or directory

Thank you, Sara

vobencha commented 7 years ago

Thanks for the update. I'll have a look at this tomorrow. Valerie

vobencha commented 7 years ago

Hi, The windows error was a problem on our end. It should be cleaned up now.

Examples of writing S4 methods are in IRanges, GenomicRanges, GenomicFeatures, Rsamtools, Biobase and other core packages. Here is an example of how you would convert GENE3() to S4:

setGeneric("GENE3", signature="exprMatrix",
    function(exprMatrix, treeMethod="RF", K="sqrt", nTrees=1000, regulators=NULL, targets=NULL, nCores=1, verbose=FALSE, seed=NULL)
{
    standardGeneric("GENE3")
}

setMethod("GENE3", "SummarizedExperiment",
    function(exprMatrix, treeMethod="RF", K="sqrt", nTrees=1000, regulators=NULL, targets=NULL, nCores=1, verbose=FALSE, seed=NULL)
{
    .GENE3 (assays(exprMatrix), treeMethod, K, nTrees, regulators, targets, nCores, verbose, seed)
}

setMethod("GENE3", "ExpressionSet",
    function(exprMatrix, treeMethod="RF", K="sqrt", nTrees=1000, regulators=NULL, targets=NULL, nCores=1, verbose=FALSE, seed=NULL)
{
    .GENE3 (exprs(exprMatrix), treeMethod, K, nTrees, regulators, targets, nCores, verbose, seed)
}

This internal helper would do everything that GENE3() currently does starting ~ line 79 with the arg checking:

.GENE3 <- function(exprMatrix, treeMethod="RF", K="sqrt", nTrees=1000, regulators=NULL, targets=NULL, nCores=1, verbose=FALSE, seed=NULL)
{
    # check input arguments
    if (!is.matrix(exprMatrix) && !is.array(exprMatrix)) {
        stop("Parameter exprMatrix must be a two-dimensional matrix where 
                        each row corresponds to a gene and each column corresponds 
                        to a condition/sample.")
    }
    ...
}

Valerie

vobencha commented 7 years ago

Hi Sara, We need to finish up this review. Do you have questions about converting GENE3() to a generic with methods? Valerie

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

724deb2 Merge branch 'master' of https://github.com/aertsl... 5eca604 Merge branch 'master' of https://github.com/aertsl...

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/GENIE3_buildreport_20170717151546.html

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

7c26ad9 v0.1.5

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/GENIE3_buildreport_20170717153524.html

s-aibar commented 7 years ago

Hello Valerie,

GENIE3() is now updated as a Generic. Sorry, I didn't understand what you meant about the generics in first place, but with the examples it was very easy. Thank you!

vobencha commented 7 years ago

Great. Please bump the version so we get a new build. Valerie

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

0fbd36d Update DESCRIPTION Version bump

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/GENIE3_buildreport_20170722104738.html

vobencha commented 7 years ago

Hi, Just a few last things to clean up in the SPB output:

* checking dependencies in R code ... NOTE
'library' or 'require' calls in package code:
  doParallel doRNG foreach
  Please use :: or requireNamespace() instead.
  See section 'Suggested packages' in the 'Writing R Extensions' manual.
Namespace in Imports field not imported from: testthat
  All declared Imports should be used.
package 'methods' is used but not declared

* checking R code for possible problems ... NOTE
.GENIE3: no visible global function definition for %dorng%
Undefined global functions or variables:
  %dorng%

* Checking DESCRIPTION/NAMESPACE consistency...
    * WARNING: Import testthat in NAMESPACE as well as DESCRIPTION.

Valerie

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

ea0cacc v0.99.7

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/GENIE3_buildreport_20170725031707.html

vobencha commented 7 years ago

Thanks. Looks good.

bioc-issue-bot commented 7 years ago

Your package has been accepted. It will be added to the Bioconductor svn repository and nightly builds. Additional information will be sent to the maintainer email address in the next several days.

Thank you for contributing to Bioconductor!