Bioconductor / Contributions

Contribute Packages to Bioconductor
134 stars 33 forks source link

slalom #520

Closed davismcc closed 7 years ago

davismcc commented 7 years ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

[NB: I am submitting this package under my account, though it is currently hosted through a GitHub account belonging to the Stegle Group and EMBL-EBI]

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.

bioc-issue-bot commented 7 years ago

Hi @davismcc

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: slalom
Type: Package
Title: Factorial Latent Variable Modeling of Single-Cell RNA-Seq Data
Version: 0.99.0
Date: 2017-10-06
Author: Florian Buettner
Maintainer: Davis McCarthy <davis@ebi.ac.uk>
Authors@R: c(person("Florian", "Buettner", role=c("aut", "cre"),
        email="buettner@ebi.ac.uk"), person("Naruemon",
        "Pratanwanich", role=c("aut"), email="np394@ebi.ac.uk"), 
        person("Davis", "McCarthy", role=c("aut"), email="davis@ebi.ac.uk"),  
        person("John", "Marioni", role = c("aut"), email="marioni@ebi.ac.uk"), 
        person("Oliver", "Stegle", role = c("aut"), email="stegle@ebi.ac.uk"))
Description: slalom is a scalable modelling framework for single-cell RNA-seq 
        data that uses gene set annotations to dissect single-cell transcriptome
        heterogeneity, thereby allowing to identify biological drivers of 
        cell-to-cell variability and model confounding factors.
Depends: R (>= 3.4)
Imports: Rcpp (>= 0.12.8), RcppArmadillo, BH, GSEABase, methods, rsvd, 
  SingleCellExperiment, stats
Suggests: knitr, rhdf5, testthat
LinkingTo: Rcpp, RcppArmadillo, BH
License: GPL (>= 2)
VignetteBuilder: knitr
LazyData: true
Encoding: UTF-8
biocViews: SingleCell, RNASeq, Normalization,
        Visualization, DimensionReduction, Transcriptomics,
        GeneExpression, Sequencing, Software, Reactome
RoxygenNote: 6.0.1
bioc-issue-bot commented 7 years ago

Your package has been approved for building. Your package is now submitted to our queue.

IMPORTANT: Please read the instructions for setting up a push hook on your repository, or further changes to your repository will NOT trigger a new build.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/slalom_buildreport_20171006164543.html

Liubuntu commented 7 years ago

Hi Davis,

Thank you for your contribution to Bioconductor. It is a general a very nice package, with only some minor issues.

DESCRIPTION

NAMESPACE

Vignette

Generally looks good. Succinct and informative. Some suggestions:

R

Thanks, Qian

mtmorgan commented 7 years ago

In addition to Qian's comments, we noticed that the current code creates two class definitions, in the package name space and in the global environment. I believe one should use the class name to generate the class; the use of module$SlalomModel creates the second class definition.

/slalom master$ git diff
diff --git a/R/SlalomModel-methods.R b/R/SlalomModel-methods.R
index e7543f3..21991e8 100644
--- a/R/SlalomModel-methods.R
+++ b/R/SlalomModel-methods.R
@@ -99,9 +99,8 @@ newSlalomModel <- function(

 .newSlalom <- function(Y, pi_init, n_hidden, design = NULL) {
     n_known <- ifelse(is.null(design), 0, ncol(design))
-    slalom_module <- Rcpp::Module("SlalomModel", PACKAGE = "slalom")
     out <- new(
-        slalom_module$SlalomModel,
+        "Rcpp_SlalomModel",
         Y_init = Y,
         pi_init = pi_init,
         X_init = matrix(0, nrow = nrow(Y), ncol(pi_init)),

I'm not an Rcpp expert, though...

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

f345c82 Responding to Bioc feedback 20a2080 Bumping version number

davismcc commented 7 years ago

Thanks, Qian and Martin for the thoughtful feedback. I have just committed some changes that deal with most of these issues:

I'm planning to update the vignette with more details as you suggest (I'll add some plotting functions for results between now and Monday) and tidy up the R code too.

One final thing where your feedback would be appreciated: I've been vacillating on whether to have generic methods (init, train) to initialise and train the Rcpp_SlalomModel objects, or just have regular functions (e.g. initSlalom, trainSlalom) that require the object argument to be a Rcpp_SlalomModel. Do you have opinions about which approach is preferable? Generics need less typing from the user, but specific functions are, erm, more specific and avoid any possible NAMESPACE clashes. If you have opinions that will help me decide on mine!

Best, Davis

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/slalom_buildreport_20171012103001.html

mtmorgan commented 7 years ago

Personally I like the initSlalom() / trainSlalom() approach, because there are unlikely to be other 'init' and 'train' methods with the same contract as yours, and because simple names like init() tend to lead to conflicts more often than name-mangled.

This means that the type checking performed by the method needs to be made explicit in the body of the plain-old-function.

I know others on the core team would argue differently, so you can pick your poison and know that your are satisfying some people (or dissatisfying others, depending on how you view the cup).

Liubuntu commented 7 years ago

Hi Davis,

Thank for your quick response of the review. Here is the summary of your updates:

Best, Qian

davismcc commented 7 years ago

Thanks, both, especially your comments on the function naming, Martin. I have come to accept that it's impossible to please everyone. I was leaning towards the consistency (e.g. with newSlalomModel) and relative simplicity of initSlalom()/trainSlalom(), so I'll go with that.

I'll update ASAP over the next couple of days.

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

6992581 Adding .preTrain() method and updating tests da6b8d5 Minor bug fixes and adding to tests dfd6bcd Adding plotting functions.

davismcc commented 7 years ago

Hi both,

I've added some plotting functions, as well as a pre-training method for the model and fixed a handful of bugs. I realise today's the feature freeze, but if I can (be permitted to) I'll add a method tomorrow to write add results from slalom to a SingleCellExperiment object.

I'll also add more to the vignette and check over the docs - I assume I still have a few days to sort out those things?

Best, Davis

davismcc commented 7 years ago

One adjacent query that I couldn't figure out today: gene set names from REACTOME and the like can be very long, so I'd like to be able to show users in the vignette how to truncate these (e.g. to 30 characters). However, trying the following gave a large number of warnings:

    gmtfile <- system.file("extdata", "reactome_subset.gmt", package = "slalom")
    genesets <- GSEABase::getGmt(gmtfile)
    for (i in seq_along(genesets)) {
        GSEABase::setName(genesets[[i]]) <- gsub("REACTOME_", "",
                                                 GSEABase::setName(genesets[[i]]))
        GSEABase::setName(genesets[[i]]) <- strtrim(GSEABase::setName(genesets[[i]]), 30)
    }

I couldn't figure out from the GSEABase docs how to do this happily. Perhaps you know?

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

On one or more platforms, the build results were: "ERROR, WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/slalom_buildreport_20171016202011.html

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

6f67d21 Fixing warnings from BiocCheck

davismcc commented 7 years ago

Apologies, there were a couple of sloppy things that caused warnings from BiocCheck. I've fixed these and don't get any errors or warnings from BiocCheck now.

The bigger issue with the errors on the Linux and Windows builds is due to my tests (which I've developed on Mac OS X). I specifically test the model output for agreement with different starting conditions. It seems that the Rcpp code compiles differently on Linux and Window, so yields subtly different convergence behaviour. As a result, some of my strict tests fail on these platforms. Do you have any suggestions for handling this sort of thing? Obviously, I could weaken/reduce tests, but that doesn't seem ideal!

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/slalom_buildreport_20171017055913.html

davismcc commented 7 years ago

OK, things are looking happier now :)

Liubuntu commented 7 years ago

HI Davis,

Thank you for making changes to your functions, classes and tests. These will make your package more robust. However, seems that you wasn't able to add some more explanation about the outputs or plotting functions to your vignette. As I know, if your package could not get accepted by today, it will go to the next release.

About your question of GSEABase, I could reproduce your warnings but found this to work without warnings:

gmtfile <- system.file("extdata", "reactome_subset.gmt", package = "slalom") genesets <- GSEABase::getGmt(gmtfile) nchar(unlist(lapply(genesets@.Data, GSEABase::setName)))

gs1 <- GSEABase::GeneSetCollection( lapply( genesets, function(x){ GSEABase::setName(x) <- gsub("REACTOME_", "", GSEABase::setName(x)) x } ) ) nchar(unlist(lapply(gs1@.Data, GSEABase::setName)))

@mtmorgan may have some better idea about this question.

Best, Qian

davismcc commented 7 years ago

Hi Qian,

Thanks for the update. I didnt realise that the vignette issues were crucial to making it into the release. Nevertheless, I am shortly (I mean with the next 20 mins) about to commit updates to the vignette that include much more discussion of model output and plots.

I really hope that you'll be able to accept the package after my next commit. I've put a lot of effort into getting it into this release so it would be a real shame to miss it by such a small margin.

I did manage to solve the GeneSetCollection with lapply() in the meantime.

Expect the updated vignette very, very shortly.

Best, Davis

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

e0f8cb5 Adding function to add Slalom results to SingleCel... 1d469ae Completed vignette.

davismcc commented 7 years ago

OK, it's all there! I sincerely hope you can accept the package into this release!

D.

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/slalom_buildreport_20171017145456.html

davismcc commented 7 years ago

Alright, that's all looking good! Do you need anything more from me for acceptance of the package? (Starting to get late here in UK).

Best, Davis

Liubuntu commented 7 years ago

Hi Davis,

I just confirmed with the bioconductor core team and your package should still be accepted and make the release (if @mtmorgan goes over it, at the soonest tomorrow, and thought it could be accepted). Sorry for the wrong information I gave in last message. The current release of Bioconductor 3.5 is frozen today. So you still have chance to go to the current devel 3.6 and be in the new release after oct. 31.

In the meantime, could you also give a look at this NOTE message? this could be easily fixed by adding a zzz.R, utils::globalVariables(n_gain, term, ...). It's not mandatory though.

* checking R code for possible problems ... [14s] NOTE
plotRelevance: no visible binding for global variable 'n_gain'
plotRelevance: no visible binding for global variable 'term'
plotRelevance: no visible binding for global variable 'n_loss'

Best, Qian

Liubuntu commented 7 years ago

The vignette is more comprehensive and prettier with figures and explanations about each step. Just one thing:

Once this done, it will be all good to me. @mtmorgan

Best, Qian

bioc-issue-bot commented 7 years ago

Received a valid push; starting a build. Commits are:

4f0113a Added scater to Suggests 0d414e4 Adding sessionInfo() to vignette and fixing NOTE a...

davismcc commented 7 years ago

Thanks, Qian. I've fixed the NOTE and added sessionInfo() to the vignette in the latest commit. Fingers crossed for Martin's review and acceptance.

Best, Davis

bioc-issue-bot commented 7 years ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on Linux, Mac, and Windows.

Congratulations! The package built without errors or warnings on all platforms.

Please see the following build report for more details:

http://bioconductor.org/spb_reports/slalom_buildreport_20171017175226.html

bioc-issue-bot commented 7 years ago

Your package has been accepted. It will be added to the Bioconductor Git repository and nightly builds. Additional information will be sent to the maintainer email address in the next several days.

Thank you for contributing to Bioconductor!

mtmorgan commented 7 years ago

The master branch of your GitHub repository has been added to Bioconductor's git repository.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/davismcc.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using biocLite(\"slalom\"). The package 'landing page' will be created at

https://bioconductor.org/packages/slalom

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.