Bioconductor / Contributions

Contribute Packages to Bioconductor
135 stars 33 forks source link

`biocmask` #3572

Closed jtlandis closed 4 weeks ago

jtlandis commented 1 month ago

Update the following URL to point to the GitHub repository of the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

I am familiar with the essential aspects of Bioconductor software management, including:

For questions/help about the submission process, including questions about the output of the automatic reports generated by the SPB (Single Package Builder), please use the #package-submission channel of our Community Slack. Follow the link on the home page of the Bioconductor website to sign up.

bioc-issue-bot commented 1 month ago

Hi @jtlandis

Thanks for submitting your package. We are taking a quick look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: biocmask
Title: Data masks for SummarizedExperiment enabling dplyr-like manipulation
Version: 0.99.11
Authors@R: c(
    person("Justin", "Landis", email = "jtlandis314@gmail.com", role = c("aut", "cre"),
    comment = c(ORCID = "0000-0001-5501-4934")),
    person("Michael", "Love", email = "michaelisaiahlove@gmail.com", role = "aut",
    comment = c(ORCID = "0000-0001-8401-0545")))
Description: The package provides `rlang` data masks for the SummarizedExperiment class. The enables the evaluation of unquoted expression in different contexts of the SummarizedExperiment object with optional access to other contexts. The goal for `biocmask` is for evaluation to feel like a data.frame object without ever needing to unwind to a rectangular data.frame.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
VignetteBuilder: knitr
Imports: 
    dplyr,
    purrr,
    rlang,
    SummarizedExperiment,
    tidyr,
    tidyselect,
    vctrs,
    tibble,
    pillar,
    cli,
    glue,
    S7,
    S4Vectors,
    utils,
    methods
Depends: 
    R (>= 4.4.0)
Suggests:
    devtools,
    knitr,
    rmarkdown,
    testthat,
    airway,
    IRanges,
    here
LazyData: true
URL: https://github.com/jtlandis/biocmask, https://jtlandis.github.io/biocmask
BugReports: https://www.github.com/jtlandis/biocmask/issues
biocViews: Annotation, GenomeAnnotation, Transcriptomics
Config/testthat/edition: 3
bioc-issue-bot commented 1 month ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 month ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): biocmask_0.99.11.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/biocmask to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 month ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 9e949898b81a3676b350320b6d65a5b9c29384df

bioc-issue-bot commented 1 month ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): biocmask_0.99.12.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/biocmask to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 month ago

A reviewer has been assigned to your package for an indepth review. Please respond accordingly to any further comments from the reviewer.

bioc-issue-bot commented 1 month ago

Your package has been added to git.bioconductor.org to continue the pre-review process. A build report will be posted shortly. Please fix any ERROR and WARNING in the build report before a reviewer is assigned or provide a justification on why you feel the ERROR or WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting up remotes to push to git.bioconductor.org. All changes should be pushed to git.bioconductor.org moving forward. It is required to push a version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org access. To manage keys and future access you may want to active your Bioconductor Git Credentials Account

bioc-issue-bot commented 1 month ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): biocmask_0.99.12.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/biocmask to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

hpages commented 1 month ago

Hi @jtlandis,

Thanks for this submission.

This is how a RangedSummarizedExperiment object gets displayed before loading biocmask:

library(airway)
data(airway
airway
# class: RangedSummarizedExperiment 
# dim: 63677 8 
# metadata(1): ''
# assays(1): counts
# rownames(63677): ENSG00000000003 ENSG00000000005 ... ENSG00000273492
#   ENSG00000273493
# rowData names(10): gene_id gene_name ... seq_coord_system symbol
# colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
# colData names(9): SampleName cell ... Sample BioSample

And this is how it gets displayed after loading biocmask:

library(biocmask)
airway
# # A RangedSummarizedExperiment-tibble Abstraction: 63,677 × 8
#     .features       .samples  | counts | gene_id gene_name entrezid gene_biotype
#     <chr>           <chr>     |  <int> | <chr>   <chr>        <int> <chr>       
#   1 ENSG00000000003 SRR10395… |    679 | ENSG00… TSPAN6          NA protein_cod…
#   2 ENSG00000000005 SRR10395… |      0 | ENSG00… TNMD            NA protein_cod…
#   3 ENSG00000000419 SRR10395… |    467 | ENSG00… DPM1            NA protein_cod…
#   4 ENSG00000000457 SRR10395… |    260 | ENSG00… SCYL3           NA protein_cod…
#   5 ENSG00000000460 SRR10395… |     60 | ENSG00… C1orf112        NA protein_cod…
#   …        …            …            …      …        …            …       …     
# n-4 ENSG00000273489 SRR10395… |      0 | ENSG00… RP11-180…       NA antisense   
# n-3 ENSG00000273490 SRR10395… |      0 | ENSG00… TSEN34          NA protein_cod…
# n-2 ENSG00000273491 SRR10395… |      0 | ENSG00… RP11-138…       NA lincRNA     
# n-1 ENSG00000273492 SRR10395… |      0 | ENSG00… AP000230…       NA lincRNA     
# n   ENSG00000273493 SRR10395… |      0 | ENSG00… RP11-80H…       NA lincRNA     
## ℹ n = 509,416
## ℹ 16 more variables: gene_seq_start <int>, gene_seq_end <int>,
##   seq_name <chr>, seq_strand <int>, seq_coord_system <int>, symbol <chr>,
##   `` <>, SampleName <fct>, cell <fct>, dex <fct>, albut <fct>, Run <fct>,
##   avgLength <int>, Experiment <fct>, Sample <fct>, BioSample <fct>

Loading a package is not supposed to have this kind of side effect. Imagine the confusion that will result in situations where biocmask is not explicitely loaded by the user (with library(biocmask)), but gets instead loaded as a dependency (direct or indirect) of another package.

I see that this is the consequence of the package redefining the show() method for RangedSummarizedExperiment (originally defined in the SummarizedExperiment package) with its own. Please do not do that.

More generally speaking, you're not allowed to redefine existing methods with your own. Defining more specialized methods (i.e. defining a foo() method for class B when there's already a foo() method for parent class A) is ok if done properly. However, redefining an existing method (i.e. defining a foo() method for class A when there's already a foo() method for class A defined somewhere else) is never ok.

Thanks, H.

mikelove commented 1 month ago

We can certainly take this out.

We had followed from the behavior in tidySummarizedExperiment, but fine to remove this behavior in biocmask, as it is not core to the functionality. We could come up with a different function that users can pipe into, at the end of a series of commands, like showtidy().

hpages commented 1 month ago

Thanks Mike.

Didn't know that tidySummarizedExperiment was also doing this. It should not. So unfortunate! I'll ask them to reconsider.

Maybe I'm overreacting but it's the first time I see something like that in the many years I've been involved with Bioconductor. People are used to see a RangedSummarizedExperiment object displayed a certain way, so I'm glad we agree that silently changing that to present the object as a "RangedSummarizedExperiment-tibble Abstraction" is not ok. This is something that the user needs to explicitely request. Again, it cannot simply be a side effect of loading a package. Sorry for the rant...

So in this case it's probably ok to redefine the show() method for SummarizedExperiment objects, but only if the new method does the following:

Concretely this means that global option restore_SummarizedExperiment_show would need to be replaced with something like show_SummarizedExperiment_as_tibble_abstraction that defaults to FALSE.

Also I get this when I load tidySummarizedExperiment after biocmask:

Loading required package: ttservice
Registered S3 methods overwritten by 'tidySummarizedExperiment':
  method                         from    
  filter.SummarizedExperiment    biocmask
  group_by.SummarizedExperiment  biocmask
  mutate.SummarizedExperiment    biocmask
  print.SummarizedExperiment     biocmask
  pull.SummarizedExperiment      biocmask
  select.SummarizedExperiment    biocmask
  summarise.SummarizedExperiment biocmask
  summarize.SummarizedExperiment biocmask

That is not good either. What's the relationship between tidySummarizedExperiment and biocmask? What differentiates the latter from the former? Can't the latter piggy back on the former so it doesn't need to redefine a bunch of things that are already defined in the former? It sounds like communication/coordination/harmonization with the rest of the tidy* community in Bioconductor could have been better.

Finally, about the name of the package: biocmask. Given that the package only handles SummarizedExperiment objects and derivatives ("The package provides rlang data masks for the SummarizedExperiment class"), wouldn't it make sense to use a more sepcific name? The current name makes it sound like the package provides rlang data masks for bioc stuff in general. Maybe you want to choose a name that leaves some room for other packages that will do the same but for other BioC objects?

Thanks, H.

mikelove commented 1 month ago

I think of biocmask as an alternative approach to tidySE -- as they do a subset of common things with different approaches, different efficiency, different convenience, different verbosity. We do need to be more explicit about that in the vignette, and explain when one might want to use one vs the other. But I wouldn't necessarily think we need to have both loaded at once in an R session, because they are alternatives to some degree. I can check with Stefano.

Regarding "bioc" in the name, we were considering extending to other Bioc S4 objects. But maybe they are mostly SE-shaped objects. @jtlandis what do you think?

jtlandis commented 1 month ago

Hello @hpages

Thank you for taking the time to review our submission!

regarding the printing methods - We can change the implementation to which ever you feel best. If we do go with changing the global option to opt-in, is BioConductor okay with the show method being:

setMethod(
  f="show",
  signature="SummarizedExperiment",
  definition=function(object) {
    if (isTRUE(x=getOption(x="show_SummarizedExperiment_as_tibble_abstraction",
                           default = FALSE)) 
    ) {
      print(object)
    } else {
      methods::getMethod(
        f = "show",
        signature = "SummarizedExperiment",
        where=asNamespace(ns = "SummarizedExperiment"))(object)
    }
  }
)

or is over-writing this method upon loading still a bad practice? Alternatively we could export two functions:


# to store old method
def_env <- new.env(parent = emptyenv(), size = 2L)

#' @name tidy-show-optin
#' @return method called for side effects
#' @export
use_tidy_show <- function() {
  if (!is.null(def_env$last_def)) {
    stop("`use_tidy_show()` has already been called once!")
  }
  # store the original definition
  # should fail if definition doesn't yet exist
  def_env$last_def <- getMethod(
    f = "show",
    signature = "SummarizedExperiment"
  )
  setMethod(
    f = "show",
    signature = "SummarizedExperiment",
    definition = function(object) {
      tidyShow(object) #defined in biocmask
      invisible(object)
    }
  )
}

#' @export
use_default_show <- function() {
  if (is.null(def_env$last_def)) {
    stop("`use_tidy_show()` has not yet been called!")
  }

  removeMethod(
    f = "show",
    signature = "SummarizedExperiment", 
    where = topenv(parent.frame())
  )
  setMethod(
    f = def_env$last_def@generic,
    signature = def_env$last_def@target,
    definition = def_env$last_def@.Data
  )
  def_env$last_def <- NULL
}

From the above, which would you prefer?

I also echo @mikelove on the differences between tidySE and biocmask. At the root, the implementation and syntax is different, forcing users to be more verbose/explicit. At the moment, biocmask is only supporting core dplyr verbs, whereas tidySE is supporting some tidyr functions (pivot_longer/pivot_wider/separate/unite)

Regarding the name "biocmask", this package was intended to bring rlang masks to Bioconductor S4 objects, hence the name. I plan to eventually release the API that would allow other packages to implement syntax similar to biocmask for their structured S4 objects. However, it is still fairly immature and not general enough to be used in other packages (yet!). So for this release cycle, we limited the scope to only the SummarizedExperiment class.

hpages commented 1 month ago

Hi @jtlandis,

If we do go with changing the global option to opt-in, is BioConductor okay with the show method being:

Yes, because even if the new method masks the original, by default it forwards to the original.

Thanks for explaining the differences between tidySummarizedExperiment and biocmask. I realize now that there's some mention of that in the vignette. I still don't think that biocmask should mask 8 of the S3 methods defined in tidySummarizedExperiment. This will break the former if it got loaded before biocmask, and vice-versa. Unless the 8 methods are the same? In this case, could the code duplication be avoided?

Regarding the name "biocmask". It's still not clear to me whether you plan to broaden the scope of the package to other non-SummarizedExperiment derivatives in the future, or whether this will happen in additional packages. If the latter, then choosing "biocmask" for the SummarizedExperiment-specific package maybe is not the best strategy? Anyways, this is a suggestion only. I can live with the current name.

Thanks, H.

jtlandis commented 1 month ago

Yes, because even if the new method masks the original, by default it forwards to the original.

Okay, I will adopt this change soon.

I still don't think that biocmask should mask 8 of the S3 methods defined in tidySummarizedExperiment.

biocmask does mask these functions because they are attempting to fulfill similar space but there isn't a good one-to-one mapping in their implementation. Perhaps we should be more explicit in the README stating that biocmask is written as an alternative to tidySummarizedExperiment and that loading both will cause conflicts.

There are expressions that biocmask can do that tidySummarizedExperiment will fail on, and there are expressions in which tidySummarizedExperiment will not return a SummarizedExperiment object, but a tibble, but biocmask will disallow since a SummarizedExperiment is not recoverable.

If the latter, then choosing "biocmask" for the SummarizedExperiment-specific package maybe is not the best strategy?

I somewhat agree as I wasn't sure if biocmask would be obligated to include dplyr implementations for other S4 objects. I am amendable to changing the name of the package to something more specific like seplyr or plyexperiment and then splitting the portion of the codebase that generates the masks into biocmask. @hpages , is the splitting of the codebase something that could be done this cycle will it have to wait until next cycle?

mikelove commented 1 month ago

My two cents is to submit a specialized SE package now (that is, biocmask but renamed to indicate that it is SE specific). So we would continue with submission under a new name.

And then next cycle do the rework of code base, potentially moving some of the generic machinary into a package that the SE-specific package loads (if we choose to move in this direction). We will learn a lot from this current SE-specific package being available to users.

bioc-issue-bot commented 1 month ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 3ccfbaaff03bf2994f5b2895579267602c96b8e3

bioc-issue-bot commented 1 month ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): biocmask_0.99.17.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/biocmask to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

bioc-issue-bot commented 1 month ago

Received a valid push on git.bioconductor.org; starting a build for commit id: b78582f7dc55735bfe5844f06e65a8736d6dfd12

bioc-issue-bot commented 1 month ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

Congratulations! The package built without errors or warnings on all platforms.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: Linux (Ubuntu 24.04.1 LTS): biocmask_0.99.18.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/biocmask to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.

jtlandis commented 1 month ago

Hello @hpages,

We have made the following changes

  1. biocmask does not change the default show method on load or on attach. Users will need to opt in by either setting the global option, or using use_show_tidy() or use_show_default().
  2. We have removed the S3 print method in favor of a new S3 generic, show_tidy, which will always print biocmask's pretty printing regardless of the global option.
  3. Included mentions within the README and Vignette that biocmask and tidySummarizedExperiment are alternatives to each other. This is supported by adding package events within biocmask that check for if tidySummarizedExperiment becomes attached or is already attached, and sends a verbose message.

@mikelove and I have chatted and we are okay with changing the name of biocmask to better reflect the nature of the package. We are thinking of renaming to plyxp where "ply" is a nod to dplyr "xp" represents "experiment". In doing such a renaming, what would be the best way to approach this?

hpages commented 1 month ago

@jtlandis Please resubmit with the new name (plyxp). Even though we've passed the deadline for a new submission to be included in BioC 3.20, we'll do our best to get the package in BioC 3.20. Thanks!

bioc-issue-bot commented 4 weeks ago

Received a valid push on git.bioconductor.org; starting a build for commit id: 35ac8a30d5b59523de480890fc1ff3afc4431744

bioc-issue-bot commented 4 weeks ago

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "skipped, ERROR". This may mean there is a problem with the package that you need to fix. Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder: ERROR before build products produced.

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020, when making changes to your repository push to git@git.bioconductor.org:packages/biocmask to trigger a new build. A quick tutorial for setting up remotes and pushing to upstream can be found here.