Closed bv2 closed 4 years ago
Hi @bv2
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: MOFA2
Type: Package
Title: Multi-Omics Factor Analysis v2
Version: 0.99.0
Maintainer: Britta Velten <britta.velten@gmail.com>
Author: Ricard Argelaguet, Damien Arnol, Danila Bredikhin, Britta Velten
Date: 2018-09-03
License: GPL (>= 2)
Description: The MOFA2 package contains a collection of tools for running and analysing MOFA models.
Encoding: UTF-8
Depends: R (>= 4.0)
Imports: rhdf5, dplyr, tidyr, reshape2, pheatmap, ggplot2, methods, GGally, RColorBrewer, cowplot, ggrepel, reticulate, HDF5Array, grDevices, stats, magrittr, forcats, utils, corrplot, ggrastr, DelayedArray, Rtsne, uwot
Suggests: knitr, testthat, Seurat, ggpubr, doParallel, foreach, psych, MultiAssayExperiment
biocViews: DimensionReduction, Bayesian, Visualization
VignetteBuilder: knitr
LazyData: false
NeedsCompilation: yes
RoxygenNote: 7.1.1
SystemRequirements: Python (>=3), numpy, pandas, h5py, scipy, argparse, sklearn, mofapy2
A reviewer has been assigned to your package. Learn what to expect during the review process.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. It is required to push a version bump to git.bioconductor.org to trigger a new build.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on Linux, Mac, and Windows.
On one or more platforms, the build results were: "ERROR, skipped". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details. This link will be active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/MOFA2
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @LiNk-NY, thank you for taking care of our package! Currently, the package fails during the build as it relies on the python package mofapy2 as specified in the DESCRIPTION. mofapy2 is available on https://pypi.org/project/mofapy2/. The package builds without errors on systems, where mofapy2 is installed but here it is missing this dependency. Would it be possible to add this dependency to the builder? Thanks!
@bv2 - I'm not part of the bioc team so take this with a grain of salt. However you may have better luck using basilisk to manage your python environment using conda.
Hi Britta, @bv2 Thank you for your submission to Bioconductor. Please see the review below. If you have any questions, feel free to post them here.
Best, Marcel
MOFA
go under deprecation procedures?MOFA
version 2.0.0
instead?basilisk
for possibly smoother integration to python. SummarizedExperiment
, SingleCellExperiment
, etc.). The main representation
could benefit from having a more structured format including accessor functions.Authors@R
field for more detailed authorship attributions as
well as inclusion of ORCID identifiers.BugReports
field pointing to the GitHub issues location.sample_names
vs colnames
,
features_names
vs rownames
, and factor_names
vs levels
) but this is
optional.plot
generic if there are package
classes involved in plot_*
.%>%
if you intend your users to use the
pipe as well.@
accessor and use a more formalized interface with
such as samples_metadata(model)
or even metadata(model)
if you
inherit from the Annotated
class. (instead of model@samples_metadata
;
applies to model@cache
as well).||
to ensure that conditions on either side return a single logical
value.object@status
and nfactors
R/AllGenerics.R
and others).lapply(r2_mk, function(x) x*100)
could be unlist
> *100
> relist
operationapply
family loops
(R/calculate_variance_explained.R
)models
were a formalized S4 class with validity checks, you wouldn't
have to use "sanity checks" in compare_elbo
and select_model
class(x) == "character"
. As seen with the matrix
class,
it is better to use the is.character
function (R/correlate_covariates.R
).MultiAssayExperiment
namespace is available (using requireNamespace
)
in .create_mofa_from_mae
function otherwise return an error to the
user and ask them to install the package.names(x)<-value
) can be used to 'initialise' the
MOFA object (rather than using @
).R/enrichment.R
).doParrallel
insted of BiocParallel
? I do not
know what the differences are..hasSlot(x, 'on_disk')
should not be necessary,
unless there are old serialized instances of this class. If so, you should
have an updateObject
method as well.make_example_data
function (not exported) should be either in
data-raw
or inst/extdata
/ inst/scripts
and saved as a serialized
dataset in the package using usethis::use_data
.get_default_tranining_options
) Ideally, the documentation should answer
questions like 'What is the function for?', 'At what point
in the process is it used?' etc.;
and move the code to another line.onAttach
startup messages can become verbose if all packages are using
them, avoid their use where possible.Minor:
<-
arrow for assignment instead of the =
equals sign.length(x) > 1
is equal to length(x)
when evaluated within an if
conditionalisTRUE(x)
on arguments that are logical, simply use if (x)
(R/correlate_covariates.R
).message
functions into one (R/enrichment.R
).template_script*
from this folder and possibly place them
in inst/scripts
.images
should be in the vignettes folder for usemofapy2
and only
keep relevant R code in the repository.test_enrichment.R
for testing
enrichment.R
) but this is optionalGitHub
these
can change at any time (in test_creating_model.R
). Provide your own dataset
or obtain one from running their examples (if possible).Thanks Marcel and Alan for your comments. We will work on the points in the review and certainly take a look at basilisk. This sounds interesting!
I'll also point out that http://bioconductor.org/books/devel/OSCA/integrating-with-protein-abundance.html#integration-with-gene-expression-data has a section waiting for MOFA2, if it can accept SingleCellExperiment objects.
Hi Marcel @LiNk-NY ,
thanks again for your time to carefully review our package. We went through all the points that you (as well as Aaron and Alan) raised above and updated our package to incorporate many of these comments in the newest commits (current version 0.99.10). Details on the points are listed below.
Should MOFA go under deprecation procedures? Yes as soon as MOFA2 is released MOFA should be deprecated
Why not release MOFA version 2.0.0 instead? Yes, this could have been an option from the beginning. At the current stage, though, a new package seems more sensible to us.
As @Alanocallaghan said, consider using basilisk for possibly smoother integration to python. We looked into basilisk but it reduces the speed of the model training in our package. Therefore we do not yet include it but we will work on a workaround that provides at least the option to use basilisk if installation via pip and interface with reticulate do not succeed.
We added support for SingleCellExperiments as requested by Aaron
Consider inheriting from a standard Bioconductor representation (e.g., SummarizedExperiment, SingleCellExperiment, etc.). The main representation could benefit from having a more structured format including accessor functions. We provide wrappers to construct a MOFA object starting from these representations, however would like to keep our main MOFA class as is.
Preferably use Authors@R field for more detailed authorship attributions as well as inclusion of ORCID identifiers. Done
Include a BugReports field pointing to the GitHub issues location. Done
My preference is to avoid using names that are application specific and use or reuse a more general term (e.g., using sample_names vs colnames, features_names vs rownames, and factor_names vs levels) but this is optional. We stick to the current naming conventions as we do not want to confuse current users of MOFA or MOFA2.
Where possible, reduce the function name length to more manageable names. We stick to the current naming conventions as we do not want to confuse current users of MOFA2.
Look into the possible reuse of the plot generic if there are package classes involved in plot_. All plot functions are based on the MOFA class. *
Consider re-exporting the %>% if you intend your users to use the pipe as well. Done
Avoid using the @ accessor and use a more formalized interface with such as samples_metadata(model) or even metadata(model) if you inherit from the Annotated class. (instead of model@samples_metadata; applies to model@cache as well). Done
Update any commented code (Line#204) Done
Keep the text within the 80 character width limit. Done
Remove any HTML documents from the vignettes folder Done
Use || to ensure that conditions on either side return a single logical value. Done
(optional) Show method can be condensed based on the values of object@status and nfactors. We prefer a more detailed output here .
Remove any old / commented code (R/AllGenerics.R and others). Done
Where possible, use vectorized operations (e.g., lapply(r2_mk, function(x) x100) could be unlist > 100 > relist operation Done
Where possible, avoid nesting apply family loops (R/calculate_variance_explained.R) Not sure how to avoid these if not replaced by other nested structures.
If models were a formalized S4 classf with validity checks, you wouldn't have to use "sanity checks" in compare_elbo and select_model As these two functions are for very advanced usage only and most users only work with a single object, we feel a simple list serves this purpose.
Avoid using class(x) == "character". As seen with the matrix class, it is better to use the is.character function (R/correlate_covariates.R). Done
Check that MultiAssayExperiment namespace is available (using requireNamespace) in .create_mofa_from_mae function otherwise return an error to the user and ask them to install the package. Done
Setter functions (e.g., names(x)<-value) can be used to 'initialise' the MOFA object (rather than using @). Done where possible, some setter functions only work on trained objects.
Provide users a way to turn the messages off (R/enrichment.R). Done
Is there a reason for using doParrallel insted of BiocParallel? I do not know what the differences are. Done , this was deprecated and hence removed as dependency.
Checking if the class .hasSlot(x, 'on_disk') should not be necessary,unless there are old serialized instances of this class. If so, you should have an updateObject method as well. Done
Perhaps the make_example_data function (not exported) should be either in data-raw or inst/extdata / inst/scripts and saved as a serialized dataset in the package using usethis::use_data. This function is exported and allows users to change different dimensions of the model, therefore we would keep it.
Because the package includes a lot of functions, consider separating the plotting and the infrastructure into separate packages.
We prefer having a single package.
Provide additional details for the user in some functions (e.g., get_default_tranining_options) Ideally, the documentation should answer questions like 'What is the function for?', 'At what point in the process is it used?' etc. Done
Avoid using semicolons ; and move the code to another line Done
Where possible use character instead of numeric indexing (e.g., x[1] vs x["group"]) Done where possible
.onAttach startup messages can become verbose if all packages are using them, avoid their use where possible. Done
Minor:
Use <- arrow for assignment instead of the = equals sign. Done
length(x) > 1 is equal to length(x) when evaluated within an if conditional lenght(x) > 1 seems more intuitive/readbale to us
Avoid using isTRUE(x) on arguments that are logical, simply use if (x) (R/correlate_covariates.R). Done
Condense message functions into one (R/enrichment.R). Done
Remove scripts template_script from this folder and possibly place them in inst/scripts. Done *
images should be in the vignettes folder for use Done , we removed them as they are not in use any more.
I recommended that you create a separate repository for mofapy2 and only keep relevant R code in the repository. Yes, this makes sense and we are considering this for the next release. However, as we currently have ongoing developments that affect both the R and python code we would like to keep them joint until then (and thus still have the same structure as the MOFA package).
Remove the setup.py script. See above point, will be moved to a python code repository together with mofapy2 folder
It's good that you have quite a few unit tests. The recommended way of adding tests is according to file name (e.g., test_enrichment.R for testing enrichment.R) but this is optional Done
Avoid downloading resources from external websites such as GitHub these can change at any time (in test_creating_model.R). Provide your own dataset or obtain one from running their examples (if possible). Done
More tests can't hurt :) We added some more
@LiNk-NY Quick update: We also implemented usage of basilisk for integration with python (now version 0.99.12)
Hi Britta, @bv2
Thanks for making those changes and responding to the review.
Consider adding mofapy2
, setup.py
and the Dockerfile
to .Rbuildignore
or moving them to the inst/scripts
folder.
I will accept your package.
Best regards,
Marcel
Note. Bump the package version and resolve any errors / warnings in the build report.
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
The master branch of your GitHub repository has been added to Bioconductor's git repository.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/bv2.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("MOFA2")
. The package 'landing page' will be created at
https://bioconductor.org/packages/MOFA2
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
I am familiar with the essential aspects of Bioconductor software management, including:
For help with submitting your package, please subscribe and post questions to the bioc-devel mailing list.