RConsortium / S7

S7: a new OO system for R
https://rconsortium.github.io/S7
Other
403 stars 34 forks source link

Allowing for multiple packages to export the same generic #240

Open dgkf opened 2 years ago

dgkf commented 2 years ago

I'm still familiarizing myself with R7, so apologies if this is a naive question. Perhaps this boils down to a philosophy of how generics are supposed to operate, but I'd still be curious to hear the rationale. I didn't see a discussion anywhere, but would be interested in reading the background if the decision is documented.

Proposal

Allow multiple packages to independently register generics of the same name, such that their methods are registered regardless of which of the package(s) are loaded.

Example

Using the trivial package examples in the test suite, test package t2 exports the method

R7::method(foo, R7::class_character) <- function(x) "foo"

This function has no dependence on t1 aside from t1 declaring the foo generic. Until t1 is attached, foo() is unusable.

library(t2)
foo("abc")
# Error in foo("abc") : could not find function "foo"

library(t1)
foo("abc")
# [1] "foo"

Instead, it would be great if the method implemented in t2 was functional without attaching t1 or having to import and reexport t1::foo.

library(t2)
foo("abc")
# [1] "foo"

Discussion

Like S3 and S4, this current design continues to put developers in this scenario where they need to be acutely aware of whoever has declared which functions as generics first, and also means that using these functions necessitates the originating package as a soft dependency.

This works well if you know that the originating package is always necessary, but forces an unnecessary dependency if a package is meant to be a drop-in replacement for the originating package using a similar, generic api. Or, if you're simply unaware that another package exports a generic of the same name, now the power of having a generic function is lost to masking, even if the method signatures are non-overlapping.

Instead, I think it would encourage broader use of generic apis if such generics were more resilient to accidental masking and didn't require carrying forward package dependencies.

Other considerations

Trying to think through what scenarios might arise if multiple packages did export the same generic, and propose some behaviors.

The biggest challenge I see is in handling named dispatch_args

If two packages export generics with the same dispatch_args

If two packages export generics that have mismatched dispatch_args

hadley commented 2 years ago

When you say:

Allow multiple packages to independently register generics of the same name, such that their methods are registered regardless of which of the package(s) are loaded.

I'm not sure what you mean. Say both pkgA and pkgB define foo <- new_generic("foo"). I'd say that there are two generics with the same name, but they are not the same generic. Registering a method on pkgA::foo() shouldn't affect pkgB::foo() because they are only coincidentally related by having the same name (in the same way that e.g. Hmisc::is.discrete() and plyr::is.discrete() are coincidentally related).

If your concern is registering methods for generics in suggested packages, then we already have a mechanism for this: R7::new_external_generic().

dgkf commented 2 years ago

I'd say that there are two generics with the same name, but they are not the same generic. [They] shouldn't affect [each other] because they are only coincidentally related by having the same name.

If the generic exists as a means of dispatching on type, and the type signatures of methods can be disambiguated, is it a problem that they only share a generic name? I suspect the answer might be "yes" - it's just not obvious to me why.

hadley commented 2 years ago

I’m not saying we couldn’t make this work, I’m saying we’ve deliberately chosen not too. We believe this behaviour is a problem with S3, not a feature.

dgkf commented 2 years ago

I see - I was hoping I could read up on this decision if it's documented somewhere. If it's implicitly the right thing to do, it's not clear to me, and I was hoping to learn why languages choose this style.

Perhaps this is already obvious to you, but I'll lay out what I see as the unfortunate repercussions of this decision. If we have a pkgA that exports a generic, fn, and pkgB that also wants to provide a totally compatible, but independent fn, then it necessitates that pkgA as a soft dependency in order to use this function. For the sake of motivating the example, let's assume pkgA is a heavy dependency - perhaps it has niche system requirements or brings a large dependency footprint.

# pkgA R/fn.R
#' @export
foo <- R7::new_generic("foo", "x")
R7::method(foo, R7::class_character) <- function(x) "pkgA"
# pkgB R/fn.R
foo <- R7::new_external_generic("pkgA", "foo", "x")
R7::method(foo, R7::class_numeric) <- function(x) "pkgB"
.onLoad <- function(libname, pkgname) R7::external_methods_register()

There are plenty of reasons not to want to inherit this dependency just to use a similar API. Simultaneously providing that generic while also allowing it to continue to function if another package is loaded introduces quite a lot of development overhead. In order to provide its own version of the generic without requiring the dependency, it needs to also export its own generic:

# pkgB R/fn.R
fn_num <- function(x) "pkgB"

#' @export
fn <- R7::new_generic("fn", "x")
R7::method(fn, R7::class_numeric) <- fn_num

#' @export
pkgA_fn <- R7::new_external_generic("pkgA", "fn" "x")
R7::method(t1_foo, R7::class_numeric) <- fn_num

.onLoad <- function(libname, pkgname) {
  R7::external_methods_register()
}

And now we have a combinatorial problem - If another package (pkgC) wants to also provide a compatible generic without requiring the dependency, it now needs to export its own version fn, a version that extends pkgB::fn and a version that extends pkgA::fn. Moreover, pkgB needs to be revised to now extend pkgC::fn for the two to be compatible.

Permitting multiple packages to export the same generic would mean that these packages wouldn't need to keep tabs on who else is leveraging the same api.

I have no doubts that the team has already considered this scenario. It's clear to me what challenges it introduces, but not what benefits. I don't maintain any packages that are widely depended on, so perhaps I'm just not savvy to the hurdles this introduces for packages trying to manage a reverse dependency ecosystem.

hadley commented 2 years ago

I think I have two germane opinions:

(Also I'm not sure that the new_external_generic() mechanism actually requires you to take a dependency on the package at all, unless you want to test it)

dgkf commented 2 years ago

Taking a soft dependency (i.e. Suggests) to register a method for a generic is (a) easy and (b) the right thing to do.

I completely agree. I think this is a great practice that helps in the discovery of related work. I hope what I'm proposing isn't coming across as anything that would discourage this.

However, this is Suggests for the package, but required for using the method - that's the part that feels off to me. I'm of the opinion that the method should be useful whether the Suggests package is available or not.

If you called yourself Hadley, you are not me. The same applies to functions.

I appreciate the point you're making, but I would propose a different framing. If someone asks you what your name is (name(@hadley)), you would say "Hadley". Then, if someone asks what the name of my cat is (name(@dgkf's cat)), it seems sort of weird that I need to make sure you first are able to tell me what Hadley's name is. I know that my cat is not Hadley, and the two "types" are independent from one another. Yet I need to either 1) come up with my own function to replace name that avoids conflicts (eg dgkf_cat_name(@dgkf's cat) - a bit redundant), or 2) require that you know who Hadley is.

(or 3, plea with whoever called dibs on the function name and ask them to create a generics package, which starts to look conceptually similar to letting anyone register a method against it anyways.)

hadley commented 2 years ago

I feel like we're talking past each other. Do you have a concrete example using existing packages that might help me understand better?

dgkf commented 2 years ago

Good call - I think that's on me. I rushed to discussing design decisions. Let me back up a bit and start with an S3 example. I think R7 carries forward this behavior, so S3 is a familiar basis for examples.

A practical example

I'm currently developing a package for a CDISC (pharma) datetime representations. I would like to offer a lubridate-compatible year() (and family) api in two user scenarios:

To my knowledge, the only way to do this is to export my own generic (parttime::year), as well as methods on lubridate's generic (lubridate::year), which will mask parttime::year on load.

# NAMESPACE
export(year)
S3method(year,partial_time)
S3method(lubridate::year,partial_time)

This is not a complaint levied against lubridate. This is just used to highlight a pain point with S3/S4/R7. At the end of the day, the ability to export a lubridate-independent year() is not going to critically limit the use case of the package, nor is taking a dependency on lubridate. I use this example because it's a nice minimal example of the difficulties of package generic composition.

Explicit method registration: Pros & Cons

This mechanism of registration feels clunky. I appreciate that it helps to be declarative about composition between packages, but it's not without drawbacks:

What I like...

What feels off...

Proposal

Instead, R7 could provide a mechanism for packages to mutually export generics so that packages don't necessitate the loading of other packages, while still working cleanly with those other packages when available.

There are a whole host of further design decisions about how compatibility is encouraged, when methods get masked, whether the default method can be masked, what happens if dispatch doesn't align, managing generic api evolution, etc. But let's get on the same page first.

dgkf commented 2 years ago

For what it's worth, I have a working prototype of the behavior in my fork and wrote up a small description of behaviors. The goal isn't to fork the project or implement and contribute the feature without discussion - this is just a sandbox. I'm trying to explore whether this is a good idea, and what the tradeoffs would be, but having a working example might communicate the proposal better than I can in words.

Right now the implementation is a big pile of hacks and pretty much any features not explicitly used by the little example are entirely untested with the changes. :man_scientist:

lawremi commented 2 years ago

Hi Doug, thanks for your interest in R7 and the thought you are putting into this problem. It relates to the long standing challenge of separating interface from implementation. Ideally, we could define common APIs in lightweight packages that other packages extend by defining methods on the common generics. This is why BiocGenerics exists, for example. One could imagine something similar for date/time. Perhaps the R Consortium ISC could help define those. In the meantime, I think your solution of defining your own generic and a conditional method on the lubridate generic is the best compromise. Sometimes it is best to be explicit when doing complicated things, and I think the R7 API is expressive enough to make that feasible.

dgkf commented 2 years ago

Thanks @lawremi - I'm glad to have found the work. It's impressive how clean the implementation already is.

This is why BiocGenerics exists, for example. One could imagine something similar for date/time.

I understand that this is the go-to pattern, but I don't want to hone in too closely on my specific use case. My package isn't mature enough to prompt a refactor of the datetime landscape. I only used it as one example of a pain-point I see in the current class offerings.

If the solution today is to retrofit the package ecosystem with communal generics, then wouldn't it make sense to solve this proactively with the class implementation? Are there major concerns with such an approach?

I believe R7 could serve both use cases quite elegantly, giving package authors the ability to set constraints on how their generics may be extended. This would also allow packages to transition functions like year from explicit registration to open registration without going the route of splitting off a generics package (I don't mean to pick on lubridate here, just continuing the original example).

dgkf commented 2 years ago

Was just curious what might have come from this discussion? I got the impression from the discussion that this is seen as undesirable behavior.

If this capability is still being considered, I would be interested in mocking up what form I think this could take. I've already put together a working proof-of-concept for a "shared-by-default" generic. The form that it's in now leaves all generics open for extension, but I appreciate that there may be a desire to restrict when and how generics compose with other packages. If this behavior would be of interest, I'd be happy to do a more comprehensive feature proposal that dives into some of these details.

lawremi commented 2 years ago

Hi Doug, thanks for the proposal. I will invite you to our next working group meeting so that we can discuss this topic.

dgkf commented 1 year ago

Trying to capture the outcome of today's discussion.

I think the tone of the room was that the idea was intriguing, but that it needs to be approached cautiously. As far as specific next steps:

kaneplusplus commented 1 year ago

I would like to lend support to @dgkf proposal or something similar. I think there are a few challenges to having a single generic definition.

  1. Having a separate generic package increases the number of dependencies.
  2. It increases the amount of coordination required when generic definitions are changed.
  3. What happens when I want to use an existing generic name but I want a different signature? For example, how would an S7 version of filter() be written to accommodate both dplyr and stats?
  4. A single generic definition in its own package is a convention. Is this going to be enforced? If so, how?
gowerc commented 1 year ago

Having run into this issue today with S4 generics (e.g. generics I create masking generics of other packages). I think I agree with all the concerns raised by @kaneplusplus so it would be good to have a solution or some sort of mechanism to help deal with this.

hadley commented 1 year ago

@gowerc can you expand on your situation with a concrete example?

Coming back to this after a while, I'm still not clear how we can make this work in R. I can see how maybe we could make this work in a language where you always provided the argument names, but I don't see how you disambiguate this case:

# pkgA
foo <- new_generic("foo", "x")
method(foo, class_integer) <- function(x) 1

# pkgB
foo <- new_generic("foo", "y")
method(foo, class_integer) <- function(y) 2

If you call foo(1L) how is the system supposed to know which method to use?

I could see how we could resolve this ambiguity:

# pkgA
foo <- new_generic("foo", "x")
method(foo, class_integer) <- function(x) 1

# pkgB
foo <- new_generic("foo", c("x", "y"))
method(foo, class_integer) <- function(y) 2

But most generics use ..., making this only apply to a sliver of methods in practice.

gowerc commented 1 year ago

@hadley - My particular case (and I freely admit I am new to S4 so this is probably a beginner issue) was that we wanted to define a as.data.frame() method for our object but there is no S4 generic for this in base R so we had to define our own generic. However a as.data.frame() generic is defined in BiocGeneric (which is imported into many BioConductor packages) so if anyone imports a BioConductor package after our package then our as.data.frame() method breaks.

This leaves us in the unfortunate position of having to either have BiocGenerics as a unwanted dependency or we have to employ some workaround like defining our own exported S3 method for our object (to use the base generic) that calls our internal S4 generic.

Either way none of this feels intuitive and I think its fair to say that the interactions of Namespaces and exported generics/methods is extremely confusing (at least I have found it so 😄 ) and would be good to avoid/simplify (assuming it is even possible to do so)

hadley commented 1 year ago

@gowerc that's a specific S4 problem that S7 doesn't have 😄

gowerc commented 1 year ago

@hadley - Would I be right in thinking that "that's a specific S4 problem that S7 doesn't have" applies to this specific case of using as.data.frame() as its a base generic which S7 is able to provide methods for ? I am assuming this would still be an issue for non-base generics (though admittedly this is far less common of a collision point).

hadley commented 1 year ago

@gowerc correct — S7 builds directly upon S3 generic, so there's no need to create "copies".

ltierney commented 1 year ago

On Wed, 19 Jul 2023, Hadley Wickham wrote:

As usual I have not been following things as closely as I would have liked, so these comments may be misguided.

@gowerc can you expand on your situation with a concrete example?

Coming back to this after a while, I'm still not clear how we can make this work in R. I can see how maybe we could make this work in a language where you always provided the argument names, but I don't see how you disambiguate this case:

pkgA

foo <- new_generic("foo", "x") method(foo, class_integer) <- function(x) 1

pkgB

foo <- new_generic("foo", "y") method(foo, class_integer) <- function(y) 2

If you call foo(1L) how is the system supposed to know which method to use?

Seems to me there is nothing to disambiguate: These are different generics. You can access the one you want with pkgA::foo and pkgB::foo. No different really than pkgA and pkgB each providing a non-generic function by the same name.

If pkgB wants to add methods to pkgA::foo it can do that with

method(pkgA::foo, class_double) <- function(y) 2

or something equivalent. If pkgB wants the pkgA::foo to be available as pkgB:: foo it can import and re-export it.

Without involving packages

foo <- new_generic("foo", "y") method(foo, class_integer) <- function(y) 1 bar <- foo baz <- new_generic("foo", "y") method(foo, class_integer) <- function(y) 1

should have foo and bar as different names for the same generic, but baz as a different generic; so

identical(foo, bar)

TRUE

identical(foo, baz)

FALSE

[I'm not sure if the "name" slot of the generic is used for more than printing; if it is that might be worth re-examining. Same question for class objects. ]

One issue with pkgB adding a method to pkgA::foo is that this could replace a method that pkgA wants to keep. We do print a note, but we could also signal some condition that could be caught. Also we could provide an interface for specifying that certain methods cannot be redefined (using lockBinding for example). Dylan used to call this method sealing

An issue that has always been a problem with S3 is that class names live in a flat name space. So if pkgA defines FOOBAR and methods, and pkgB imports foo, defines a very different FOOBAR class and methods for it, then after loading pkgB the method used for pkgA FOOBAR objects will be wrong. I usually try to deal with this by naming my classes and methods with a package prefix, like

structure(..., class = "pkgA_FOOBAR") foo.pkgA_FOOBAR

It would be nice if we could get this to happen under the hood (we may already but I'm not seeing it in the minimal experiments I have done).

gowerc commented 1 year ago

@ltierney , My understanding is the main point of this proposal was to provide some mechanism to enable generics to be mangled e.g. if 2 packages export the same generic they don't mask each other. The issue that this is trying to resolve is that if 2 packages define the same generic then they will mask each other even if all the method signatures were completely unique and they could theoretically live in harmony. In particular the proposal is trying to avoid the issue of package developers having to keep track of which packages define which generics and avoid the need to have arbitrary dependencies just to ensure you don't mask each others generics.

@hadley's example though demonstrates the complexity of such a proposal in that if you allowed separate generics to be mangled how do your resolve cases where people have defined the generics differently? Perhaps a solution to this could be that you only enable mangling if the signatures are identical, if not they are treated as separate generics like normal? Though this potentially adds complexity that is greater than the problem it solves.

hadley commented 1 year ago

@ltierney wrt the flat namespacing issue, we currently have the package argument to new_class():

#' @param package Package name. It is good practice to set the package
#'   name when exporting an S7 class from a package because it includes
#'   the package name in the class name when it's used for dispatch. This
#'   allows different packages to use the same name to refer to different
#'   classes. If you set `package`, you _must_ export the constructor.

Behind the scenes that turns the name used for dispatch into package::name. So this currently a recommended convention rather than a requirement.