Bioconductor / GenomicDataCommons

Provide R access to the NCI Genomic Data Commons portal.
http://bioconductor.github.io/GenomicDataCommons/
84 stars 23 forks source link

GenomicDataCommons masks dplyr functions #78

Closed alanocallaghan closed 1 year ago

alanocallaghan commented 4 years ago

This is a great package. However overwriting verbs without providing a default method means that loading this package after dplyr breaks dplyr. It doesn't need to. Just provide a default method that calls the other generic.

reprex:

library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
d <- filter(mtcars, mpg > 10)
library("GenomicDataCommons")
#> Loading required package: magrittr
#> 
#> Attaching package: 'GenomicDataCommons'
#> The following objects are masked from 'package:dplyr':
#> 
#>     count, filter, select
#> The following object is masked from 'package:stats':
#> 
#>     filter
d <- filter(mtcars, mpg > 10)
#> Error in UseMethod("filter", x): no applicable method for 'filter' applied to an object of class "data.frame"
LiNk-NY commented 1 year ago

I think if you intend to use filter for mtcars then you shouldn't load GenomicDataCommons or use dplyr::filter. Perhaps it's better to import the default generic rather than export a custom one?

alanocallaghan commented 1 year ago

I assumed GDC didn't import dplyr, but as it does then yes would be better to attach methods to the dplyr generic rather than breaking it

LiNk-NY commented 1 year ago

I took a look at your PR #79 and it looks like we can't adopt the dplyr generics because the signatures are not the same (see Sec7). We'd have to change all our method signatures to agree with the generic, which is a breaking change. The generics and methods designed here are specific to the package. AFAICT, the recommendation is to use dplyr::filter.

alanocallaghan commented 1 year ago

Seems incredibly antisocial, but okay

LiNk-NY commented 1 year ago

Sorry but I disagree. These rules are written in the Writing R Extensions linked above. IMO, introducing breaking changes to support external and unrelated functionality from dplyr etc. is not the way to go.

Best regards, Marcel

alanocallaghan commented 1 year ago

Copying the interface of a package and not co-operating with it doesn't seem the way to go either, but it's up to you

LiNk-NY commented 1 year ago

I agree but this has been part of the package design for a while and it would be a breaking change for downstream dependencies. I think it's more trouble than it is worth since the user do dplyr::filter.

It's not up to me but to Sean @seandavi who is the maintainer and creator of the package.

hpages commented 1 year ago

How many users use :: when working interactively? For most users, the error they get in case of conflict is too cryptic so they are clueless about what's actually going on. Note that Hadley's conflicted package helps with this:

> library(conflicted)
> filter
Error:
! [conflicted] `filter` found in 4 packages.
Either pick the one you want with `::` 
* dplyr::filter
* ensembldb::filter
* GenomicDataCommons::filter
* stats::filter
Or declare a preference with `conflict_prefer()`
* conflict_prefer("filter", "dplyr")
* conflict_prefer("filter", "ensembldb")
* conflict_prefer("filter", "GenomicDataCommons")
* conflict_prefer("filter", "stats")
Run `rlang::last_error()` to see where the error occurred.

As you can see above, we even have those conflicts within our own Bioconductor ecosystem, where several packages define their own incompatible filter generics:

> args(ensembldb::filter)
function (x, filter = AnnotationFilterList()) 
NULL
> args(GenomicDataCommons::filter)
function (x, expr) 
NULL
> args(stats::filter)
function (x, filter, method = c("convolution", "recursive"), 
    sides = 2L, circular = FALSE, init = NULL) 
NULL

We can improve the situation via BiocGenerics. The way we usually handle this is by defining the filter() generic there (probably with the (x, filter, ...) arguments), and then ask Bioconductor packages to use that. So at least we avoid conflicts within our own ecosystem.

As for the conflict with dplyr::filter, it's a little bit harder to avoid. Maybe there's a way to define BiocGenerics::filter() S4 methods for data.frame, ts, tbl_lazy, data.table, and dtplyr_step, in BiocGenerics, that redirect to their corresponding S3 methods defined in dplyr, dbplyr, and dtplyr. However that means that BiocGenerics would then need to depend on those packages, and I would be somewhat hesitant to do that.

Maybe we should make sure that all Bioconductor users have conflicted in their search path e.g. by putting it in BiocGenerics's Depends field?

H.

alanocallaghan commented 1 year ago

As a middleground here you could define default s3 methods that suggest using :: to access tidyverse equivalents

hpages commented 1 year ago

As a middleground here you could define default s3 methods that suggest using :: to access tidyverse equivalents

That's kind of what conflicted does out-of-the-box already.

FWIW adding conflicted to BiocGenerics's Depends field was already discussed here 4 years ago: https://github.com/r-lib/conflicted/issues/8#issuecomment-395234248. But we never made it. Should we try this in BioC 3.17?