Open dgkf opened 2 years ago
When you say:
Allow multiple packages to independently register generics of the same name, such that their methods are registered regardless of which of the package(s) are loaded.
I'm not sure what you mean. Say both pkgA and pkgB define foo <- new_generic("foo")
. I'd say that there are two generics with the same name, but they are not the same generic. Registering a method on pkgA::foo()
shouldn't affect pkgB::foo()
because they are only coincidentally related by having the same name (in the same way that e.g. Hmisc::is.discrete()
and plyr::is.discrete()
are coincidentally related).
If your concern is registering methods for generics in suggested packages, then we already have a mechanism for this: R7::new_external_generic()
.
I'd say that there are two generics with the same name, but they are not the same generic. [They] shouldn't affect [each other] because they are only coincidentally related by having the same name.
If the generic exists as a means of dispatching on type, and the type signatures of methods can be disambiguated, is it a problem that they only share a generic name? I suspect the answer might be "yes" - it's just not obvious to me why.
I’m not saying we couldn’t make this work, I’m saying we’ve deliberately chosen not too. We believe this behaviour is a problem with S3, not a feature.
I see - I was hoping I could read up on this decision if it's documented somewhere. If it's implicitly the right thing to do, it's not clear to me, and I was hoping to learn why languages choose this style.
Perhaps this is already obvious to you, but I'll lay out what I see as the unfortunate repercussions of this decision. If we have a pkgA
that exports a generic, fn
, and pkgB
that also wants to provide a totally compatible, but independent fn
, then it necessitates that pkgA
as a soft dependency in order to use this function. For the sake of motivating the example, let's assume pkgA
is a heavy dependency - perhaps it has niche system requirements or brings a large dependency footprint.
# pkgA R/fn.R
#' @export
foo <- R7::new_generic("foo", "x")
R7::method(foo, R7::class_character) <- function(x) "pkgA"
# pkgB R/fn.R
foo <- R7::new_external_generic("pkgA", "foo", "x")
R7::method(foo, R7::class_numeric) <- function(x) "pkgB"
.onLoad <- function(libname, pkgname) R7::external_methods_register()
There are plenty of reasons not to want to inherit this dependency just to use a similar API. Simultaneously providing that generic while also allowing it to continue to function if another package is loaded introduces quite a lot of development overhead. In order to provide its own version of the generic without requiring the dependency, it needs to also export its own generic:
# pkgB R/fn.R
fn_num <- function(x) "pkgB"
#' @export
fn <- R7::new_generic("fn", "x")
R7::method(fn, R7::class_numeric) <- fn_num
#' @export
pkgA_fn <- R7::new_external_generic("pkgA", "fn" "x")
R7::method(t1_foo, R7::class_numeric) <- fn_num
.onLoad <- function(libname, pkgname) {
R7::external_methods_register()
}
And now we have a combinatorial problem - If another package (pkgC
) wants to also provide a compatible generic without requiring the dependency, it now needs to export its own version fn
, a version that extends pkgB::fn
and a version that extends pkgA::fn
. Moreover, pkgB
needs to be revised to now extend pkgC::fn
for the two to be compatible.
Permitting multiple packages to export the same generic would mean that these packages wouldn't need to keep tabs on who else is leveraging the same api.
I have no doubts that the team has already considered this scenario. It's clear to me what challenges it introduces, but not what benefits. I don't maintain any packages that are widely depended on, so perhaps I'm just not savvy to the hurdles this introduces for packages trying to manage a reverse dependency ecosystem.
I think I have two germane opinions:
Suggests
) to register a method for a generic is (a) easy and (b) the right thing to do. (Also I'm not sure that the new_external_generic()
mechanism actually requires you to take a dependency on the package at all, unless you want to test it)
Taking a soft dependency (i.e. Suggests) to register a method for a generic is (a) easy and (b) the right thing to do.
I completely agree. I think this is a great practice that helps in the discovery of related work. I hope what I'm proposing isn't coming across as anything that would discourage this.
However, this is Suggests
for the package, but required for using the method - that's the part that feels off to me. I'm of the opinion that the method should be useful whether the Suggests
package is available or not.
If you called yourself Hadley, you are not me. The same applies to functions.
I appreciate the point you're making, but I would propose a different framing. If someone asks you what your name is (name(@hadley)
), you would say "Hadley". Then, if someone asks what the name of my cat is (name(@dgkf's cat)
), it seems sort of weird that I need to make sure you first are able to tell me what Hadley's name is. I know that my cat is not Hadley, and the two "types" are independent from one another. Yet I need to either 1) come up with my own function to replace name
that avoids conflicts (eg dgkf_cat_name(@dgkf's cat)
- a bit redundant), or 2) require that you know who Hadley is.
(or 3, plea with whoever called dibs on the function name
and ask them to create a generics package, which starts to look conceptually similar to letting anyone register a method against it anyways.)
I feel like we're talking past each other. Do you have a concrete example using existing packages that might help me understand better?
Good call - I think that's on me. I rushed to discussing design decisions. Let me back up a bit and start with an S3 example. I think R7 carries forward this behavior, so S3 is a familiar basis for examples.
I'm currently developing a package for a CDISC (pharma) datetime representations. I would like to offer a lubridate
-compatible year()
(and family) api in two user scenarios:
year(<partial_time>)
even when lubridate
isn't loaded (or installed, if it wasn't a hard dependency for other purposes)lubridate
and year(<partial_time>)
works alongside lubridate
's methodsTo my knowledge, the only way to do this is to export my own generic (parttime::year
), as well as methods on lubridate
's generic (lubridate::year
), which will mask parttime::year
on load.
# NAMESPACE
export(year)
S3method(year,partial_time)
S3method(lubridate::year,partial_time)
This is not a complaint levied against
lubridate
. This is just used to highlight a pain point with S3/S4/R7. At the end of the day, the ability to export a lubridate-independentyear()
is not going to critically limit the use case of the package, nor is taking a dependency onlubridate
. I use this example because it's a nice minimal example of the difficulties of package generic composition.
This mechanism of registration feels clunky. I appreciate that it helps to be declarative about composition between packages, but it's not without drawbacks:
inkjet::print
for sending a print job to a physical printer won't unintentionally register a method against base::print
for display in the repl). year()
), even if their methods have unique type signatures (year.partial_time
, year.Period
) and are intended for an identical generic purpose (both for extracting the year portion of a time object), they will mask each otheryear()
), necessitates taking an originating package as dependencies, or alternatively...Instead, R7 could provide a mechanism for packages to mutually export generics so that packages don't necessitate the loading of other packages, while still working cleanly with those other packages when available.
There are a whole host of further design decisions about how compatibility is encouraged, when methods get masked, whether the default method can be masked, what happens if dispatch doesn't align, managing generic api evolution, etc. But let's get on the same page first.
For what it's worth, I have a working prototype of the behavior in my fork and wrote up a small description of behaviors. The goal isn't to fork the project or implement and contribute the feature without discussion - this is just a sandbox. I'm trying to explore whether this is a good idea, and what the tradeoffs would be, but having a working example might communicate the proposal better than I can in words.
Right now the implementation is a big pile of hacks and pretty much any features not explicitly used by the little example are entirely untested with the changes. :man_scientist:
Hi Doug, thanks for your interest in R7 and the thought you are putting into this problem. It relates to the long standing challenge of separating interface from implementation. Ideally, we could define common APIs in lightweight packages that other packages extend by defining methods on the common generics. This is why BiocGenerics exists, for example. One could imagine something similar for date/time. Perhaps the R Consortium ISC could help define those. In the meantime, I think your solution of defining your own generic and a conditional method on the lubridate generic is the best compromise. Sometimes it is best to be explicit when doing complicated things, and I think the R7 API is expressive enough to make that feasible.
Thanks @lawremi - I'm glad to have found the work. It's impressive how clean the implementation already is.
This is why BiocGenerics exists, for example. One could imagine something similar for date/time.
I understand that this is the go-to pattern, but I don't want to hone in too closely on my specific use case. My package isn't mature enough to prompt a refactor of the datetime landscape. I only used it as one example of a pain-point I see in the current class offerings.
If the solution today is to retrofit the package ecosystem with communal generics, then wouldn't it make sense to solve this proactively with the class implementation? Are there major concerns with such an approach?
I believe R7 could serve both use cases quite elegantly, giving package authors the ability to set constraints on how their generics may be extended. This would also allow packages to transition functions like year
from explicit registration to open registration without going the route of splitting off a generics package (I don't mean to pick on lubridate
here, just continuing the original example).
Was just curious what might have come from this discussion? I got the impression from the discussion that this is seen as undesirable behavior.
If this capability is still being considered, I would be interested in mocking up what form I think this could take. I've already put together a working proof-of-concept for a "shared-by-default" generic. The form that it's in now leaves all generics open for extension, but I appreciate that there may be a desire to restrict when and how generics compose with other packages. If this behavior would be of interest, I'd be happy to do a more comprehensive feature proposal that dives into some of these details.
Hi Doug, thanks for the proposal. I will invite you to our next working group meeting so that we can discuss this topic.
Trying to capture the outcome of today's discussion.
I think the tone of the room was that the idea was intriguing, but that it needs to be approached cautiously. As far as specific next steps:
I would like to lend support to @dgkf proposal or something similar. I think there are a few challenges to having a single generic definition.
filter()
be written to accommodate both dplyr
and stats
?Having run into this issue today with S4 generics (e.g. generics I create masking generics of other packages). I think I agree with all the concerns raised by @kaneplusplus so it would be good to have a solution or some sort of mechanism to help deal with this.
@gowerc can you expand on your situation with a concrete example?
Coming back to this after a while, I'm still not clear how we can make this work in R. I can see how maybe we could make this work in a language where you always provided the argument names, but I don't see how you disambiguate this case:
# pkgA
foo <- new_generic("foo", "x")
method(foo, class_integer) <- function(x) 1
# pkgB
foo <- new_generic("foo", "y")
method(foo, class_integer) <- function(y) 2
If you call foo(1L)
how is the system supposed to know which method to use?
I could see how we could resolve this ambiguity:
# pkgA
foo <- new_generic("foo", "x")
method(foo, class_integer) <- function(x) 1
# pkgB
foo <- new_generic("foo", c("x", "y"))
method(foo, class_integer) <- function(y) 2
But most generics use ...
, making this only apply to a sliver of methods in practice.
@hadley - My particular case (and I freely admit I am new to S4 so this is probably a beginner issue) was that we wanted to define a as.data.frame()
method for our object but there is no S4 generic for this in base R so we had to define our own generic. However a as.data.frame()
generic is defined in BiocGeneric (which is imported into many BioConductor packages) so if anyone imports a BioConductor package after our package then our as.data.frame()
method breaks.
This leaves us in the unfortunate position of having to either have BiocGenerics as a unwanted dependency or we have to employ some workaround like defining our own exported S3 method for our object (to use the base generic) that calls our internal S4 generic.
Either way none of this feels intuitive and I think its fair to say that the interactions of Namespaces and exported generics/methods is extremely confusing (at least I have found it so 😄 ) and would be good to avoid/simplify (assuming it is even possible to do so)
@gowerc that's a specific S4 problem that S7 doesn't have 😄
@hadley - Would I be right in thinking that "that's a specific S4 problem that S7 doesn't have" applies to this specific case of using as.data.frame()
as its a base generic which S7 is able to provide methods for ? I am assuming this would still be an issue for non-base generics (though admittedly this is far less common of a collision point).
@gowerc correct — S7 builds directly upon S3 generic, so there's no need to create "copies".
On Wed, 19 Jul 2023, Hadley Wickham wrote:
As usual I have not been following things as closely as I would have liked, so these comments may be misguided.
@gowerc can you expand on your situation with a concrete example?
Coming back to this after a while, I'm still not clear how we can make this work in R. I can see how maybe we could make this work in a language where you always provided the argument names, but I don't see how you disambiguate this case:
pkgA
foo <- new_generic("foo", "x") method(foo, class_integer) <- function(x) 1
pkgB
foo <- new_generic("foo", "y") method(foo, class_integer) <- function(y) 2
If you call foo(1L) how is the system supposed to know which method to use?
Seems to me there is nothing to disambiguate: These are different generics. You can access the one you want with pkgA::foo and pkgB::foo. No different really than pkgA and pkgB each providing a non-generic function by the same name.
If pkgB wants to add methods to pkgA::foo it can do that with
method(pkgA::foo, class_double) <- function(y) 2
or something equivalent. If pkgB wants the pkgA::foo to be available as pkgB:: foo it can import and re-export it.
Without involving packages
foo <- new_generic("foo", "y") method(foo, class_integer) <- function(y) 1 bar <- foo baz <- new_generic("foo", "y") method(foo, class_integer) <- function(y) 1
should have foo and bar as different names for the same generic, but baz as a different generic; so
identical(foo, bar)
identical(foo, baz)
[I'm not sure if the "name" slot of the generic is used for more than printing; if it is that might be worth re-examining. Same question for class objects. ]
One issue with pkgB adding a method to pkgA::foo is that this could replace a method that pkgA wants to keep. We do print a note, but we could also signal some condition that could be caught. Also we could provide an interface for specifying that certain methods cannot be redefined (using lockBinding for example). Dylan used to call this method sealing
An issue that has always been a problem with S3 is that class names live in a flat name space. So if pkgA defines FOOBAR and methods, and pkgB imports foo, defines a very different FOOBAR class and methods for it, then after loading pkgB the method used for pkgA FOOBAR objects will be wrong. I usually try to deal with this by naming my classes and methods with a package prefix, like
structure(..., class = "pkgA_FOOBAR") foo.pkgA_FOOBAR
It would be nice if we could get this to happen under the hood (we may already but I'm not seeing it in the minimal experiments I have done).
@ltierney , My understanding is the main point of this proposal was to provide some mechanism to enable generics to be mangled e.g. if 2 packages export the same generic they don't mask each other. The issue that this is trying to resolve is that if 2 packages define the same generic then they will mask each other even if all the method signatures were completely unique and they could theoretically live in harmony. In particular the proposal is trying to avoid the issue of package developers having to keep track of which packages define which generics and avoid the need to have arbitrary dependencies just to ensure you don't mask each others generics.
@hadley's example though demonstrates the complexity of such a proposal in that if you allowed separate generics to be mangled how do your resolve cases where people have defined the generics differently? Perhaps a solution to this could be that you only enable mangling if the signatures are identical, if not they are treated as separate generics like normal? Though this potentially adds complexity that is greater than the problem it solves.
@ltierney wrt the flat namespacing issue, we currently have the package
argument to new_class()
:
#' @param package Package name. It is good practice to set the package
#' name when exporting an S7 class from a package because it includes
#' the package name in the class name when it's used for dispatch. This
#' allows different packages to use the same name to refer to different
#' classes. If you set `package`, you _must_ export the constructor.
Behind the scenes that turns the name used for dispatch into package::name
. So this currently a recommended convention rather than a requirement.
I'm still familiarizing myself with R7, so apologies if this is a naive question. Perhaps this boils down to a philosophy of how generics are supposed to operate, but I'd still be curious to hear the rationale. I didn't see a discussion anywhere, but would be interested in reading the background if the decision is documented.
Proposal
Allow multiple packages to independently register generics of the same name, such that their methods are registered regardless of which of the package(s) are loaded.
Example
Using the trivial package examples in the test suite, test package
t2
exports the methodThis function has no dependence on
t1
aside fromt1
declaring thefoo
generic. Untilt1
is attached,foo()
is unusable.Instead, it would be great if the method implemented in
t2
was functional without attachingt1
or having to import and reexportt1::foo
.Discussion
Like S3 and S4, this current design continues to put developers in this scenario where they need to be acutely aware of whoever has declared which functions as generics first, and also means that using these functions necessitates the originating package as a soft dependency.
This works well if you know that the originating package is always necessary, but forces an unnecessary dependency if a package is meant to be a drop-in replacement for the originating package using a similar, generic api. Or, if you're simply unaware that another package exports a generic of the same name, now the power of having a generic function is lost to masking, even if the method signatures are non-overlapping.
Instead, I think it would encourage broader use of generic apis if such generics were more resilient to accidental masking and didn't require carrying forward package dependencies.
Other considerations
Trying to think through what scenarios might arise if multiple packages did export the same generic, and propose some behaviors.
The biggest challenge I see is in handling named
dispatch_args
If two packages export generics with the same
dispatch_args
If two packages export generics that have mismatched
dispatch_args
dispatch_args
are disallowed, that would still let packages align on a single standard, while not requiring that all packages inherit the source as a soft dependency.dispatch_args
, perhaps by enforcing that the generic must now be called only with positional arguments. (even if arguments match somedispatch_args
names, dispatch on positional arguments and only match the remaining arguments once a single method has been selected based on positional arguments.)