Closed prabhakarlab closed 5 months ago
Hi @prabhakarlab
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: Banksy
Title: Spatial transcriptomic clustering
Version: 0.99.0
Authors@R: c(
person(given = "Vipul",
family = "Singhal",
role = c("aut")),
person(given = "Joseph",
family = "Lee",
role = c("aut", "cre"),
email = "joseph.lee@u.nus.edu",
comment = c(ORCID="0000-0002-4983-4714"))
)
Description: Banksy is an R package that incorporates spatial information to
cluster cells in a feature space (e.g. gene expression). To incorporate
spatial information, BANKSY computes the mean neighborhood expression and
azimuthal Gabor filters that capture gene expression gradients. These
features are combined with the cell's own expression to embed cells in a
neighbor-augmented product space which can then be clustered, allowing for
accurate and spatially-aware cell typing and tissue domain segmentation.
Depends:
R (>= 3.5.0)
Imports:
aricode,
data.table,
dbscan,
SpatialExperiment,
SingleCellExperiment,
SummarizedExperiment,
S4Vectors,
stats,
matrixStats,
igraph,
irlba,
leidenAlg (>= 1.1.0),
utils,
uwot,
RcppHungarian
License: file LICENSE
Encoding: UTF-8
URL: https://github.com/prabhakarlab/Banksy
RoxygenNote: 7.2.3
Suggests:
knitr,
rmarkdown,
pals,
scuttle,
scater,
cowplot,
ggplot2,
ggspavis,
testthat (>= 3.0.0)
VignetteBuilder: knitr
Config/testthat/edition: 3
biocViews:
Clustering,
Spatial,
SingleCell,
GeneExpression,
DimensionReduction
Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Build System.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Bioconductor Build System: macOS 12.6.5 Monterey: Banksy_0.99.0.tar.gz Linux (Ubuntu 22.04.2 LTS): Banksy_0.99.0.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Banksy
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.
Hi @HelenaLC,
Thank you for helping to review our package!
I am trying to push an update to address the error raised by BiocCheck. I've followed the steps here. Here are the remotes:
(base) ➜ Banksy git:(bioc) git remote -v
origin git@github.com:prabhakarlab/Banksy.git (fetch)
origin git@github.com:prabhakarlab/Banksy.git (push)
upstream git@git.bioconductor.org:packages/Banksy.git (fetch)
upstream git@git.bioconductor.org:packages/Banksy.git (push)
When trying to push to upstream, I get this error:
(base) ➜ Banksy git:(bioc) git push upstream
Enter passphrase for key '/Users/jlee/.ssh/id_rsa':
FATAL: W any packages/Banksy jleechung DENIED by fallthru
(or you mis-spelled the reponame)
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
My guess is that jleechung
does not have access rights, and that only prabhakarlab
has access. The repository was previously under jleechung
and then transferred to prabhakarlab
. Do you know how to resolve this?
Best, Joseph
@lshep can you help?
This should be fixed now.
Thanks @lshep . I have access now, but there's a hook that's rejecting the push.
(base) ➜ Banksy git:(bioc) git push upstream
Enter passphrase for key '/Users/jlee/.ssh/id_rsa':
Enumerating objects: 93, done.
Counting objects: 100% (93/93), done.
Delta compression using up to 8 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (51/51), 2.13 MiB | 4.27 MiB/s, done.
Total 51 (delta 34), reused 51 (delta 34), pack-reused 0
remote: FATAL: W refs/heads/bioc packages/Banksy jleechung DENIED by fallthru
remote: error: hook declined to update refs/heads/bioc
To git.bioconductor.org:packages/Banksy.git
! [remote rejected] bioc -> bioc (hook declined)
error: failed to push some refs to 'git.bioconductor.org:packages/Banksy.git'
If I'm reading correctly, you are on a branch called bioc? We do not have a branch called bioc. You should push to the devel
branch. git push upstream devel
or in this case if you are on a different branch you would have to map bioc to devel git push upstream bioc:devel
See http://contributions.bioconductor.org/git-version-control.html item 5 which shows the concept with the generally used default of main rather than bioc.
Received a valid push on git.bioconductor.org; starting a build for commit id: 61150247ed849ae144c5938ca2c42b80e04c9309
Thanks for the clarification @lshep !
Thank you @lshep!!
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Build System.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Bioconductor Build System: macOS 12.6.5 Monterey: Banksy_0.99.1.tar.gz Linux (Ubuntu 22.04.2 LTS): Banksy_0.99.1.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Banksy
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: b1a2c8bb2af06346f303637b9bba18faa1a1fe32
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: Banksy_0.99.2.tar.gz Linux (Ubuntu 22.04.2 LTS): Banksy_0.99.2.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Banksy
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @HelenaLC, the build report shows one warning for set.seed
usage. We're wondering if an exception can be made for our package. We're aware of the dangers of calling set.seed
within a function (as discussed here). However, several of the functions in Banksy
loop over a set of parameters and run other functions which are non-deterministic - for instance, the runBanksyPCA function iterates over two parameters and runs PCA with each combination of them. To ensure that the results of each iteration can be replicated without running the entire loop, we provide an option for the user to set the seed to be used for each iteration - when the seed is set, a message is displayed indicating usage of a seed. By default, no seed is set.
If I understand correctly, I agree that a scenario like this would be a reasonable exception (granted results from each iteration are in fact returned vs. only for a set of parameters that have been selected based on said iterations), however, I am confused by the example here - to the best of my knowledge, PCA is not stochastic? Or did you perhaps mean the parameters iterated over are chosen at random?
Yes, the results from each iteration are returned. The parameters iterated over are fixed. Each combination of the parameters (m
and lambda
here) basically corresponds to a different data matrix (obtained using the getBanksyMatrix
function here) which we then process for downstream analysis (PCA, clustering).
The example is indeed confusing! You are right that PCA is indeed deterministic, but the irlba
implementation has some randomness. In particular the signs of the PC scores are arbitrary (see the example below).
For clustering, we use the leiden algorithm, which also has some randomness. The option to set the seed ensures that the cluster labels corresponding to each parameter combination can be reproduced without re-running everything else.
BiocManager::install()
[ ] consider adding a BugReports:
fields pointing to https://github.com/prabhakarlab/Banksy/issues
[ ] please update R version dependency to >= 4.3.0; Bioc 3.19, and thus the package, will not be installable with older software
[ ] eval=FALSE
flags, or equivalent tricks in markdown) undermine the benefit of vignettes and are generally not allowed (see here); BiocCheck()
currently warns that 36/54 = 65% of chunks or unevaluated
[ ] we encourage the use of BiocStyle
for formatting with html_document
as rendering target (see here)
[ ] please provide provenance information for any example data; specifically, a reference/online source for hippocampus
and a script under inst/script
for how the synthetic rings
data were generated (see here)
[ ] consider adding a package .man/help page (?Banksy
), e.g., listing key resources/references, summarizing the packages key purpose/functionality, demonstrating a quick workflow, hyperlinking to corresponding functions etc.
[ ] I would appreciate some detail on methodology, a reference etc. in the corresponding functions' documentation. For example, "return BANKSY matrix" and "compute component neighborhood matrices" does not give much of an idea what is being done here; I vaguely recall seeing the preprint (I think?), and it would be nice to get a brief @details
section saying i) how, for example, spatial coordinates are incorporated in the embedding, or anything else a given function might do (in terms of methodology), and ii) the effect different parameters might have on this. All that can be very brief with a link to an external resource, but it would be nice to have it in the function documentation where most will look.
[ ] when functions expect a specific data structure (e.g., a SpatialExperiment
is expected, computeBanksy()
has been run etc.), I'd encourage implementing some basic validity checks (at the very top of a function's body). Currently, there is no checks at all (e.g., whether or not the input is a SE/SCE/SPE), which may result in hard-to-interpret errors, especially for less programmatically inclined users. For Banksy
-specific things, you'd might also like to check all needed slots exists.
[ ] related to the above: it would be sensible to implement some minimal argument checks in order to prevent (potentially heavy) computations from being carrier out, only for the function to fail downstream for stupid reasons (like a typo, misinterpretation of an argument etc.); e.g., verbose
is a length-one logical value, an argument is a scalar numeric or integer etc.
[ ] clusterBanksy()
lists 4 choices for algo
, but only Leiden is implemented.
[ ] nothing wrong, just if that's useful: consecutive if-else-
statements on character strings such as computation.R lines 293+ can nicely written using switch()
, e.g.,
switch(spatial_mode,
rNN_gauss={...},
kNN_rank={...},
...)
[ ] related to the above: when a function argument expects a character string that matches a list of predefined choices, this is typically done using match.arg()
early on (defaults to the first) and listing choices in the function definition; this i) lists possible choices directly in the function doc/argument auto-complete, ii) automatically gives an informative error when an invalid values is given, and iii) prevents part of the function running when it will eventually fail, e.g., ...
computeNeighbors <- \(...,
spatial_mode=c("kNN_median", "rNN_gauss", "kNN_rank", ...))
{
spatial_mode <- match.arg(spatial_mode)
...
}
@prabhakarlab may we expect changes addressing the review comments above soon?
Yes, we will do so latest by the first week of January. Thank you!
Received a valid push on git.bioconductor.org; starting a build for commit id: 9f71260088ea1e80a7a23f82f7b49a3efd18a0c9
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: ERROR before build products produced.
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Banksy
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 711840e21ff1919c014bbfb0ffb5d39cc78cb982
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "WARNINGS, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: Banksy_0.99.4.tar.gz Linux (Ubuntu 22.04.2 LTS): Banksy_0.99.4.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Banksy
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 0356b74e6d2058a8d36b4dabdc5f746e67d6ee9c
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "WARNINGS". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: macOS 12.7.1 Monterey: Banksy_0.99.5.tar.gz Linux (Ubuntu 22.04.2 LTS): Banksy_0.99.5.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/Banksy
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Thank you for the review, @HelenaLC. We have made the necessary changes; the remaining warning comes from set.seed
usage. Please see below for responses to points of significance.
BiocManager::install()
[x] consider adding a BugReports: fields pointing to https://github.com/prabhakarlab/Banksy/issues
[x] please update R version dependency to >= 4.3.0; Bioc 3.19, and thus the package, will not be installable with older software
Updated to >= 4.4.0.
Three of the four vignettes now evaluate all their chunks. The last vignette (domain-segment.Rmd
) utilises a large, external dataset - due to size / runtime (for R CMD check time limit for vignettes) constraints we use eval=FALSE
and stash only the clustering results.
[x] _we encourage the use of BiocStyle for formatting with htmldocument as rendering target (see here)
[x] please provide provenance information for any example data; specifically, a reference/online source for hippocampus and a script under inst/script for how the synthetic rings data were generated (see here)
We have added provenance information for the example datasets; these are documented at the respective help pages, ?rings
and ?hippocampus
. For rings
, a script is provided at inst/script
. For hippocampus
, we refer users to our preprint - we will update this link once the paper is out.
[x] consider adding a package .man/help page (?Banksy), e.g., listing key resources/references, summarizing the packages key purpose/functionality, demonstrating a quick workflow, hyperlinking to corresponding functions etc.
We have now added a main package help page.
[x] I would appreciate some detail on methodology, a reference etc. in the corresponding functions' documentation. For example, "return BANKSY matrix" and "compute component neighborhood matrices" does not give much of an idea what is being done here; I vaguely recall seeing the preprint (I think?), and it would be nice to get a brief @details section saying i) how, for example, spatial coordinates are incorporated in the embedding, or anything else a given function might do (in terms of methodology), and ii) the effect different parameters might have on this. All that can be very brief with a link to an external resource, but it would be nice to have it in the function documentation where most will look.
We have added @details
sections in the documentation for the 'main' functions (computeBanksy
, getBanksyMatrix
, runBanksyPCA|UMAP
and clusterBanksy
), and added more detail to the argument descriptions. See for instance here.
[x] when functions expect a specific data structure (e.g., a SpatialExperiment is expected, computeBanksy() has been run etc.), I'd encourage implementing some basic validity checks (at the very top of a function's body). Currently, there is no checks at all (e.g., whether or not the input is a SE/SCE/SPE), which may result in hard-to-interpret errors, especially for less programmatically inclined users. For Banksy-specific things, you'd might also like to check all needed slots exists.
The code should be relatively robust to SE/SCE/SPE classes; the function where this is most relevant is the computeBanksy
function, where the spatial coordinates need to be retrieved. We do this in the getLocs
helper, which extracts the spatial coordinates given the input data class. This helper is called near the top of the function before any heavy computation.
As for other functions, we also implement checks that throw error messages detailing what is expected of the input, and what commands should be run to rectify the error (for instance, here and here).
[x] related to the above: it would be sensible to implement some minimal argument checks in order to prevent (potentially heavy) computations from being carrier out, only for the function to fail downstream for stupid reasons (like a typo, misinterpretation of an argument etc.); e.g., verbose is a length-one logical value, an argument is a scalar numeric or integer etc.
We have now implemented argument checks for the main functions (with stopifnot
, for instance, here).
[x] clusterBanksy() lists 4 choices for algo, but only Leiden is implemented.
Thank you for catching this, we have now implemented the other algos.
[x] nothing wrong, just if that's useful: consecutive if-else- statements on character strings such as computation.R lines 293+ can nicely written using switch(), e.g.,
switch(spatial_mode,
rNN_gauss={...},
kNN_rank={...},
...)
We have now implemented switch
where applicable (with the spatial_mode
argument in computeBanksy
, and the algo
argument in clusterBanksy
).
[x] related to the above: when a function argument expects a character string that matches a list of predefined choices, this is typically done using match.arg() early on (defaults to the first) and listing choices in the function definition; this i) lists possible choices directly in the function doc/argument auto-complete, ii) automatically gives an informative error when an invalid values is given, and iii) prevents part of the function running when it will eventually fail, e.g., ...
computeNeighbors <- \(...,
spatial_mode=c("kNN_median", "rNN_gauss", "kNN_rank", ...))
{
spatial_mode <- match.arg(spatial_mode)
...
}
Similarly, we have implemented match.arg
where applicable.
Apologies for the (holiday-)delay... Fantastic, thank you for addressing these thoroughly! Two minor comments or, rather, room for improvement/learning in the future. But happy to accept as is and you can address these later, perhaps once the package builds on the Bioc devel server.
pre-allocating and filling (unless really necessary, e.g., because there are side-effects), e.g., for (i in ...) x[i] <-
, should preferably written using apply
, vapply
or lapply
; e.g., cluster.R lines 630+ (see here)
nothing 'wrong', but note that match.arg
does not require specifying choices
when they are already listed in the function definition; e.g., compareClusters
function in cluster.R (clusterBanksy
does this already)
Your package has been accepted. It will be added to the Bioconductor nightly builds.
Thank you for contributing to Bioconductor!
Reviewers for Bioconductor packages are volunteers from the Bioconductor community. If you are interested in becoming a Bioconductor package reviewer, please see Reviewers Expectations.
Thank you @HelenaLC, we will address those points down the road. Appreciate your time and effort in the reviews!
The default branch of your GitHub repository has been added to Bioconductor's git repository as branch devel.
To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/prabhakarlab.keys is not empty), then no further steps are required. Otherwise, do the following:
See further instructions at
https://bioconductor.org/developers/how-to/git/
for working with this repository. See especially
https://bioconductor.org/developers/how-to/git/new-package-workflow/ https://bioconductor.org/developers/how-to/git/sync-existing-repositories/
to keep your GitHub and Bioconductor repositories in sync.
Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at
https://bioconductor.org/checkResults/
(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("Banksy")
. The package 'landing page' will be created at
https://bioconductor.org/packages/Banksy
If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.