Closed jtlandis closed 4 weeks ago
Hi @jtlandis
Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.
The DESCRIPTION file for this package is:
Package: biocmask
Title: Data masks for SummarizedExperiment enabling dplyr-like manipulation
Version: 0.99.11
Authors@R: c(
person("Justin", "Landis", email = "jtlandis314@gmail.com", role = c("aut", "cre"),
comment = c(ORCID = "0000-0001-5501-4934")),
person("Michael", "Love", email = "michaelisaiahlove@gmail.com", role = "aut",
comment = c(ORCID = "0000-0001-8401-0545")))
Description: The package provides `rlang` data masks for the SummarizedExperiment class. The enables the evaluation of unquoted expression in different contexts of the SummarizedExperiment object with optional access to other contexts. The goal for `biocmask` is for evaluation to feel like a data.frame object without ever needing to unwind to a rectangular data.frame.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
VignetteBuilder: knitr
Imports:
dplyr,
purrr,
rlang,
SummarizedExperiment,
tidyr,
tidyselect,
vctrs,
tibble,
pillar,
cli,
glue,
S7,
S4Vectors,
utils,
methods
Depends:
R (>= 4.4.0)
Suggests:
devtools,
knitr,
rmarkdown,
testthat,
airway,
IRanges,
here
LazyData: true
URL: https://github.com/jtlandis/biocmask, https://jtlandis.github.io/biocmask
BugReports: https://www.github.com/jtlandis/biocmask/issues
biocViews: Annotation, GenomeAnnotation, Transcriptomics
Config/testthat/edition: 3
Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): biocmask_0.99.11.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/biocmask
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: 9e949898b81a3676b350320b6d65a5b9c29384df
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): biocmask_0.99.12.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/biocmask
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.
Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.
IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.
Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): biocmask_0.99.12.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/biocmask
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hi @jtlandis,
Thanks for this submission.
This is how a RangedSummarizedExperiment object gets displayed before loading biocmask:
library(airway)
data(airway
airway
# class: RangedSummarizedExperiment
# dim: 63677 8
# metadata(1): ''
# assays(1): counts
# rownames(63677): ENSG00000000003 ENSG00000000005 ... ENSG00000273492
# ENSG00000273493
# rowData names(10): gene_id gene_name ... seq_coord_system symbol
# colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
# colData names(9): SampleName cell ... Sample BioSample
And this is how it gets displayed after loading biocmask:
library(biocmask)
airway
# # A RangedSummarizedExperiment-tibble Abstraction: 63,677 × 8
# .features .samples | counts | gene_id gene_name entrezid gene_biotype
# <chr> <chr> | <int> | <chr> <chr> <int> <chr>
# 1 ENSG00000000003 SRR10395… | 679 | ENSG00… TSPAN6 NA protein_cod…
# 2 ENSG00000000005 SRR10395… | 0 | ENSG00… TNMD NA protein_cod…
# 3 ENSG00000000419 SRR10395… | 467 | ENSG00… DPM1 NA protein_cod…
# 4 ENSG00000000457 SRR10395… | 260 | ENSG00… SCYL3 NA protein_cod…
# 5 ENSG00000000460 SRR10395… | 60 | ENSG00… C1orf112 NA protein_cod…
# … … … … … … … …
# n-4 ENSG00000273489 SRR10395… | 0 | ENSG00… RP11-180… NA antisense
# n-3 ENSG00000273490 SRR10395… | 0 | ENSG00… TSEN34 NA protein_cod…
# n-2 ENSG00000273491 SRR10395… | 0 | ENSG00… RP11-138… NA lincRNA
# n-1 ENSG00000273492 SRR10395… | 0 | ENSG00… AP000230… NA lincRNA
# n ENSG00000273493 SRR10395… | 0 | ENSG00… RP11-80H… NA lincRNA
## ℹ n = 509,416
## ℹ 16 more variables: gene_seq_start <int>, gene_seq_end <int>,
## seq_name <chr>, seq_strand <int>, seq_coord_system <int>, symbol <chr>,
## `` <>, SampleName <fct>, cell <fct>, dex <fct>, albut <fct>, Run <fct>,
## avgLength <int>, Experiment <fct>, Sample <fct>, BioSample <fct>
Loading a package is not supposed to have this kind of side effect. Imagine the confusion that will result in situations where biocmask is not explicitely loaded by the user (with library(biocmask)
), but gets instead loaded as a dependency (direct or indirect) of another package.
I see that this is the consequence of the package redefining the show()
method for RangedSummarizedExperiment (originally defined in the SummarizedExperiment package) with its own. Please do not do that.
More generally speaking, you're not allowed to redefine existing methods with your own. Defining more specialized methods (i.e. defining a foo()
method for class B when there's already a foo()
method for parent class A) is ok if done properly. However, redefining an existing method (i.e. defining a foo()
method for class A when there's already a foo()
method for class A defined somewhere else) is never ok.
Thanks, H.
We can certainly take this out.
We had followed from the behavior in tidySummarizedExperiment, but fine to remove this behavior in biocmask, as it is not core to the functionality. We could come up with a different function that users can pipe into, at the end of a series of commands, like showtidy()
.
Thanks Mike.
Didn't know that tidySummarizedExperiment was also doing this. It should not. So unfortunate! I'll ask them to reconsider.
Maybe I'm overreacting but it's the first time I see something like that in the many years I've been involved with Bioconductor. People are used to see a RangedSummarizedExperiment object displayed a certain way, so I'm glad we agree that silently changing that to present the object as a "RangedSummarizedExperiment-tibble Abstraction" is not ok. This is something that the user needs to explicitely request. Again, it cannot simply be a side effect of loading a package. Sorry for the rant...
So in this case it's probably ok to redefine the show()
method for SummarizedExperiment objects, but only if the new method does the following:
Concretely this means that global option restore_SummarizedExperiment_show
would need to be replaced with something like show_SummarizedExperiment_as_tibble_abstraction
that defaults to FALSE
.
Also I get this when I load tidySummarizedExperiment after biocmask:
Loading required package: ttservice
Registered S3 methods overwritten by 'tidySummarizedExperiment':
method from
filter.SummarizedExperiment biocmask
group_by.SummarizedExperiment biocmask
mutate.SummarizedExperiment biocmask
print.SummarizedExperiment biocmask
pull.SummarizedExperiment biocmask
select.SummarizedExperiment biocmask
summarise.SummarizedExperiment biocmask
summarize.SummarizedExperiment biocmask
That is not good either. What's the relationship between tidySummarizedExperiment and biocmask? What differentiates the latter from the former? Can't the latter piggy back on the former so it doesn't need to redefine a bunch of things that are already defined in the former? It sounds like communication/coordination/harmonization with the rest of the tidy*
community in Bioconductor could have been better.
Finally, about the name of the package: biocmask. Given that the package only handles SummarizedExperiment objects and derivatives ("The package provides rlang
data masks for the SummarizedExperiment class"), wouldn't it make sense to use a more sepcific name? The current name makes it sound like the package provides rlang
data masks for bioc stuff in general. Maybe you want to choose a name that leaves some room for other packages that will do the same but for other BioC objects?
Thanks, H.
I think of biocmask as an alternative approach to tidySE -- as they do a subset of common things with different approaches, different efficiency, different convenience, different verbosity. We do need to be more explicit about that in the vignette, and explain when one might want to use one vs the other. But I wouldn't necessarily think we need to have both loaded at once in an R session, because they are alternatives to some degree. I can check with Stefano.
Regarding "bioc" in the name, we were considering extending to other Bioc S4 objects. But maybe they are mostly SE-shaped objects. @jtlandis what do you think?
Hello @hpages
Thank you for taking the time to review our submission!
regarding the printing methods - We can change the implementation to which ever you feel best. If we do go with changing the global option to opt-in, is BioConductor okay with the show method being:
setMethod(
f="show",
signature="SummarizedExperiment",
definition=function(object) {
if (isTRUE(x=getOption(x="show_SummarizedExperiment_as_tibble_abstraction",
default = FALSE))
) {
print(object)
} else {
methods::getMethod(
f = "show",
signature = "SummarizedExperiment",
where=asNamespace(ns = "SummarizedExperiment"))(object)
}
}
)
or is over-writing this method upon loading still a bad practice? Alternatively we could export two functions:
# to store old method
def_env <- new.env(parent = emptyenv(), size = 2L)
#' @name tidy-show-optin
#' @return method called for side effects
#' @export
use_tidy_show <- function() {
if (!is.null(def_env$last_def)) {
stop("`use_tidy_show()` has already been called once!")
}
# store the original definition
# should fail if definition doesn't yet exist
def_env$last_def <- getMethod(
f = "show",
signature = "SummarizedExperiment"
)
setMethod(
f = "show",
signature = "SummarizedExperiment",
definition = function(object) {
tidyShow(object) #defined in biocmask
invisible(object)
}
)
}
#' @export
use_default_show <- function() {
if (is.null(def_env$last_def)) {
stop("`use_tidy_show()` has not yet been called!")
}
removeMethod(
f = "show",
signature = "SummarizedExperiment",
where = topenv(parent.frame())
)
setMethod(
f = def_env$last_def@generic,
signature = def_env$last_def@target,
definition = def_env$last_def@.Data
)
def_env$last_def <- NULL
}
From the above, which would you prefer?
I also echo @mikelove on the differences between tidySE
and biocmask
. At the root, the implementation and syntax is different, forcing users to be more verbose/explicit. At the moment, biocmask
is only supporting core dplyr
verbs, whereas tidySE
is supporting some tidyr
functions (pivot_longer/pivot_wider/separate/unite)
Regarding the name "biocmask", this package was intended to bring rlang
masks to Bioconductor S4 objects, hence the name. I plan to eventually release the API that would allow other packages to implement syntax similar to biocmask
for their structured S4 objects. However, it is still fairly immature and not general enough to be used in other packages (yet!). So for this release cycle, we limited the scope to only the SummarizedExperiment
class.
Hi @jtlandis,
If we do go with changing the global option to opt-in, is BioConductor okay with the show method being:
Yes, because even if the new method masks the original, by default it forwards to the original.
Thanks for explaining the differences between tidySummarizedExperiment and biocmask. I realize now that there's some mention of that in the vignette. I still don't think that biocmask should mask 8 of the S3 methods defined in tidySummarizedExperiment. This will break the former if it got loaded before biocmask, and vice-versa. Unless the 8 methods are the same? In this case, could the code duplication be avoided?
Regarding the name "biocmask". It's still not clear to me whether you plan to broaden the scope of the package to other non-SummarizedExperiment derivatives in the future, or whether this will happen in additional packages. If the latter, then choosing "biocmask" for the SummarizedExperiment-specific package maybe is not the best strategy? Anyways, this is a suggestion only. I can live with the current name.
Thanks, H.
Yes, because even if the new method masks the original, by default it forwards to the original.
Okay, I will adopt this change soon.
I still don't think that biocmask should mask 8 of the S3 methods defined in tidySummarizedExperiment.
biocmask
does mask these functions because they are attempting to fulfill similar space but there isn't a good one-to-one mapping in their implementation. Perhaps we should be more explicit in the README stating that biocmask
is written as an alternative to tidySummarizedExperiment
and that loading both will cause conflicts.
There are expressions that biocmask
can do that tidySummarizedExperiment
will fail on, and there are expressions in which tidySummarizedExperiment
will not return a SummarizedExperiment object, but a tibble, but biocmask
will disallow since a SummarizedExperiment is not recoverable.
If the latter, then choosing "biocmask" for the SummarizedExperiment-specific package maybe is not the best strategy?
I somewhat agree as I wasn't sure if biocmask
would be obligated to include dplyr implementations for other S4 objects. I am amendable to changing the name of the package to something more specific like seplyr
or plyexperiment
and then splitting the portion of the codebase that generates the masks into biocmask
. @hpages , is the splitting of the codebase something that could be done this cycle will it have to wait until next cycle?
My two cents is to submit a specialized SE package now (that is, biocmask
but renamed to indicate that it is SE specific). So we would continue with submission under a new name.
And then next cycle do the rework of code base, potentially moving some of the generic machinary into a package that the SE-specific package loads (if we choose to move in this direction). We will learn a lot from this current SE-specific package being available to users.
Received a valid push on git.bioconductor.org; starting a build for commit id: 3ccfbaaff03bf2994f5b2895579267602c96b8e3
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): biocmask_0.99.17.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/biocmask
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Received a valid push on git.bioconductor.org; starting a build for commit id: b78582f7dc55735bfe5844f06e65a8736d6dfd12
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
Congratulations! The package built without errors or warnings on all platforms.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): biocmask_0.99.18.tar.gz
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/biocmask
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Hello @hpages,
We have made the following changes
biocmask
does not change the default show method on load or on attach. Users will need to opt in by either setting the global option, or using use_show_tidy()
or use_show_default()
.biocmask
's pretty printing regardless of the global option.biocmask
and tidySummarizedExperiment
are alternatives to each other. This is supported by adding package events within biocmask
that check for if tidySummarizedExperiment
becomes attached or is already attached, and sends a verbose message.@mikelove and I have chatted and we are okay with changing the name of biocmask
to better reflect the nature of the package. We are thinking of renaming to plyxp
where "ply" is a nod to dplyr
"xp" represents "experiment". In doing such a renaming, what would be the best way to approach this?
@jtlandis Please resubmit with the new name (plyxp). Even though we've passed the deadline for a new submission to be included in BioC 3.20, we'll do our best to get the package in BioC 3.20. Thanks!
Received a valid push on git.bioconductor.org; starting a build for commit id: 35ac8a30d5b59523de480890fc1ff3afc4431744
Dear Package contributor,
This is the automated single package builder at bioconductor.org.
Your package has been built on the Bioconductor Single Package Builder.
On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.
Please see the build report for more details.
The following are build products from R CMD build on the Single Package Builder: ERROR before build products produced.
Links above active for 21 days.
Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/biocmask
to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.
Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor
Confirm the following by editing each check box to '[x]'
[x] I understand that by submitting my package to Bioconductor, the package source and all review commentary are visible to the general public.
[x] I have read the Bioconductor Package Submission instructions. My package is consistent with the Bioconductor Package Guidelines.
[x] I understand Bioconductor Package Naming Policy and acknowledge Bioconductor may retain use of package name.
[x] I understand that a minimum requirement for package acceptance is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS. Passing these checks does not result in automatic acceptance. The package will then undergo a formal review and recommendations for acceptance regarding other Bioconductor standards will be addressed.
[x] My package addresses statistical or bioinformatic issues related to the analysis and comprehension of high throughput genomic data.
[x] I am committed to the long-term maintenance of my package. This includes monitoring the support site for issues that users may have, subscribing to the bioc-devel mailing list to stay aware of developments in the Bioconductor community, responding promptly to requests for updates from the Core team in response to changes in R or underlying software.
[x] I am familiar with the Bioconductor code of conduct and agree to abide by it.
I am familiar with the essential aspects of Bioconductor software management, including:
For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.