RConsortium / S7

S7: a new OO system for R
https://rconsortium.github.io/S7
Other
387 stars 33 forks source link

Literal classes #267

Open mikmart opened 1 year ago

mikmart commented 1 year ago

This is a continuation of #32.

Literal types are a feature in other programming languages, like Python and TypeScript. They state that an object has a specific value. The concept of literal types could be a useful addition to the R7 OOP system in the form of literal classes.

In R7, I can think of two primary use cases for literal classes:

I made a proof-of-concept branch that can be used to illustrate these.

library(R7) # remotes::install_github("mikmart/OOP-WG@literal-classes-cont")

# Construct a literal class for a particular value. Only objects which are
# identical to the provided value are considered to inherit from this class.
class_literal("a")
<R7_literal_class>: "a"
# Unions of literal classes allow objects that are identical to
# any of the values of the member literal classes.
new_union(class_literal("a"), class_literal("b"))
<R7_union>: "a" or "b"

Restricting valid values of properties

The use-case of restricting valid values of properties for classes follows the spirit of match.arg() often used in R. The key benefits of literal classes over a match.arg() approach are:

Without literal classes

Implementing value checks for a class property without literal classes could look like this:

http_request <- new_class(
  "http_request",
  properties = list(
    method = new_property(
      class = class_character,
      default = c("GET", "POST", "PUT", "DELETE"),
      setter = function(self, value) {
        self@method = match.arg(value, c("GET", "POST", "PUT", "DELETE"))
        self
      }
    )
  )
)

Inspecting the class constructor, it looks like @method can be any character value. It’s only by inspecting the property more closely we can discover the imperative checks in the setter that restrict its valid values.

http_request
<R7_class>
@ name  :  http_request
@ parent: <R7_object>
@ properties:
 $ method: <character>
http_request@properties$method
<R7_property> 
 $ name   : chr "method"
 $ class  : <R7_base_class>: <character>
 $ getter : NULL
 $ setter : function (self, value)  
 $ default: chr [1:4] "GET" "POST" "PUT" "DELETE"

Instantiation works as expected, but the error message could use some work:

http_request()
<http_request>
 @ method: chr "GET"
http_request(method = "FOO") |> try()
Error in match.arg(value, c("GET", "POST", "PUT", "DELETE")) : 
  'arg' should be one of "GET", "POST", "PUT", "DELETE"

With literal classes

With literal classes, the above is simplified into:

class_http_method <- class_literal_union("GET", "POST", "PUT", "DELETE")
http_request <- new_class("http_request", properties = list(method = class_http_method))

The valid values of the @method property are now clearly shown in the class:

http_request
<R7_class>
@ name  :  http_request
@ parent: <R7_object>
@ properties:
 $ method: "GET", "POST", "PUT", or "DELETE"

And invalid instantiations give a clear error message:

http_request()
<http_request>
 @ method: chr "GET"
http_request(method = "FOO") |> try()
Error : <http_request> object properties are invalid:
- @method must be "GET", "POST", "PUT", or "DELETE", not <character>

Implementing specialized methods

For specialized methods, literal classes allow flattening control flow structures based on exact values in methods, and can remove the need for stand-alone helper functions to handle special cases. One suggested application in #32 was for implementing specialized handlers based on dispatching literal values of file extensions, rather than using switch() expressions.

A somewhat contrived example to illustrate the differences:

Without literal classes

foo <- new_generic("foo", "x")
method(foo, class_integer) <- function(x) {
  if (identical(x, 1L)) foo_integer_one(x) else "It's an integer."
}
foo_integer_one <- function(x) "It's a one."
foo(1L)
[1] "It's a one."

With literal classes

foo <- new_generic("foo", "x")
method(foo, class_integer) <- function(x) "It's an integer."
method(foo, class_literal(1L)) <- function(x) "It's a one."
foo(1L)
[1] "It's a one."

We can again see the shift from imperative to declarative code, and introspection benefits in being able to see the special case in the methods list rather than by inspecting the method implementation.

foo
<R7_generic> foo(x, ...) with 2 methods:
1: method(foo, class_integer)
2: method(foo, class_literal(1L))

Comparison to enumerations

Enumerations (or enums) are very similar to literal unions. Both are ways of specifying a set of valid values. The key difference is in the fact that the members (or variants) of enums should be considered opaque, i.e. the underlying value is not important, and should not be used directly. As a result, enums are typically used in control flow constructs, in particular in a manner that should cover all possible values. Literal unions on the other hand can simply be used to check that an object has a valid value, while still using that value directly.

Challenges

Introducing literal classes complicates method dispatch for generics. That is because if the value supplied to a generic is such that it could be a valid literal class, ergo could have a method specialized to that value, an additional check for that needs to be made when dispatching.

To combat this, it might be preferrable to restrict literal classes for use with plain strings, numbers, and booleans only. Then, other objects can be immediately ruled out and no dispatching by value is necessary. An alternative approach would be to simply not allow dispatching on literal classes at all, and just implement the benefits of validating property values.

Conclusion

Literal classes could be a great addition to R’s OOP toolbox, leading to more concise and declarative code that shifts the focus from imperative validation code to the problem domain at hand. Thank you for considering them.

hadley commented 1 year ago

Thanks for the write up. It seems like a pretty compelling feature set (especially if we limit, at least to start with, to scalar strings, numbers, and booleans). I've flagged it for discussion next time the working group meets.

(I also wonder if class_scalar might be a better name?)

DavisVaughan commented 1 year ago

It feels like for the match.arg() case having something more efficient than scalar("a") | scalar("b") | scalar("c") would be better from a redundancy and performance perspective, but strings are the main place where that comes up so maybe we'd just have a class_enum that is generated from a set of unique strings like enum(c("a", "b", "c"))

Also fine to come back to that one in the future, but it is a pretty motivating example

lawremi commented 2 months ago

Coming back to this after messing around in Typescript, this seems like a worthwhile feature. I'd prefer the term "literal" since it is a known CS concept, and I think we would consider the already supported NULL as a literal class. It's too bad we can't use the union syntactic sugar of | directly on the literal values like you can in Typescript (unless one member is a conventional class). I guess we could allow a formula.

ramiromagno commented 1 month ago

This is really nice. Generalizing a bit further, what about conditional definitions? See how Wolfram does it: https://reference.wolfram.com/language/ref/SetDelayed.html (under Scope / Left hand sides).

snap

lawremi commented 1 month ago

I was looking at the PoC branch by @mikmart, and the implementation is fairly elegant. I'm wondering why is_literal_value() is restricted to scalar values. Shouldn't it be able to handle any R object, in principle? It relies on deparse() for mapping to a class name, which might get unwieldy for complex objects, but it would still work, right? I guess an alternative would be generating a hash e.g. with the digest package.

mikmart commented 1 month ago

@lawremi I don't recall what my though process behind is_literal_value() was. I feel like I probably narrowed it to cases that I could easily imagine being useful, thinking that it could be widened if use-cases for more complex objects arise. I agree in principle if we take the identical() test as the defining property, that could apply much more widely.