RConsortium / S7

S7: a new OO system for R
https://rconsortium.github.io/S7
Other
413 stars 38 forks source link

S7 arrays #401

Open sjcowtan opened 8 months ago

sjcowtan commented 8 months ago

Is there a good way to create an S7 object which inherits from "array" (or "matrix", but my specific use case is "array")? If not, will this be added?

TimTaylor commented 2 months ago

FWIW - Looks like it's currently in draft stage (https://github.com/RConsortium/S7/pull/434) so hopefully going to be added.

sjcowtan commented 2 months ago

yay!

On Fri, 6 Sept 2024 at 12:24, Tim Taylor @.***> wrote:

FWIW - Looks like it's currently in draft stage (#434 https://github.com/RConsortium/S7/pull/434) so hopefully going to be added.

— Reply to this email directly, view it on GitHub https://github.com/RConsortium/S7/issues/401#issuecomment-2333845084, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHQGSVONYUHLOCFFZ2FQK6TZVGGGZAVCNFSM6AAAAABEYH7X42VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMZTHA2DKMBYGQ . You are receiving this because you authored the thread.Message ID: @.***>

t-kalinowski commented 1 month ago

This is implemented in the dev version of S7.

new_class("my_array", parent = class_array)
hadley commented 1 month ago

@t-kalinowski how does that work? what type of array do you get?

t-kalinowski commented 1 month ago

length 0 logical (this makes the most sense, for the same reason that NA is logical; it can easily be promoted to another atomic)

library(S7)

new_class("my_array", parent = class_array)
#> <my_array> class
#> @ parent     : S3<array>
#> @ constructor: function(.data, dim, dimnames) {...}
#> @ validator  : <NULL>
#> @ properties :
new_class("my_array", parent = class_array)()
#> <my_array>  logi[0 (1d)]

Created on 2024-10-11 with reprex v2.1.1

hadley commented 1 month ago

@t-kalinowski hmmm, that feels weird to me because dim is orthogonal to class in S3. So I would have expected something more like class_array_logical, class_array_numeric etc.

t-kalinowski commented 1 month ago

I agree, this should be easy.

We only provide class_* for actual base classes, that seems like a compound class.

I wonder if we should introduce a & / new_intersection() for classes, complimenting | /new_union(). Then, class_array_logical could be achieved with:

class_array & class_logical
hadley commented 1 month ago

It feels like we should discuss this design a bit more before committing to it in a release. Maybe roll back for this release?

mmaechler commented 1 month ago

I agree, this should be easy.

We only provide class_* for actual base classes, that seems like a compound class.

I wonder if we should introduce a & / new_intersection() for classes, complimenting | /new_union(). Then, class_array_logical could be achieved with:

class_array & class_logical

This sounds nice to me ((without further deep thinking though) independent of class_array etc.

lawremi commented 1 month ago

One could think of matrix and array as containers. Specifying the contents of a container is typically done through parameterization. Parameterized types would be interesting. Maybe they are represented by a higher order function that accepts the type parameters and returns a constructor that ensures the right mode?

The interface might look like:

Matrix <- new_parameterized_class("Matrix", class_matrix, list(T = class_vector))
Matrix(class_numeric)(1:10, 2)

Or, we more directly address the contract of the dim attribute presence using something like Haskell type classes:

Matrix <- new_predicate("Matrix", function(self) !is.null(attr(self, "dim")))

But that will really slow down dispatch. The methods package did explore this approach with setIs(test=) but it was never used for method selection.

t-kalinowski commented 1 month ago

Quoting from ?UseMethod:

If the object does not have a class attribute, it has an implicit class. Matrices and arrays have class "matrix" or "array", followed by the class of the underlying vector. Most vectors have a class based on the result of mode(x), except that integer vectors have the class c("integer", "numeric"), and real vectors have the class c("double", "numeric").

So, it seems we need to think of this in terms of "implicit" classes.

We currently handle the implicit "numeric" class using a union of class_integer and class_double.

Union classes cannot be used as a parent in new_class(parent=), but they can be used in method<- and in new_property(). This restriction sets a precedent: implicit class types cannot be used in all contexts.

> new_class("foo", class_numeric)
Error: `parent` must be an S7 class, S3 class, or base type, not an S7 union.

Some initial thoughts:

"array" and "matrix" are the only "implicit" S3 classes that are disconnected from typeof().

> .class2(matrix(1))
[1] "matrix"  "array"   "double"  "numeric"

One temptation is to open the barn doors and think through what it means to support arbitrary "implicit" classes, similar to a mixin class in Python or a Rust trait—where a generic can make assumptions about the object’s properties.

For example, in #433, we seem to be reaching for a class_scalar. One could imagine:

class_scalar <- new_class(
  "class_scalar",
  validator = function(self)
    if (!identical(length(self), 1L))
      "must be length 1"
)

Or, continuing along this desire path, one could further imagine implicit types as peers of regular types, allowed to be children of union type and composable with base types.

class_scalar <- new_class(
  "class_scalar",
  parent = class_vector,
  validator = function(self)
    if (!identical(length(self), 1L))
      "must be length 1"
)

class_string <- 
  new_class("class_string", parent = c(class_character, class_scalar))

This would be straightforward to implement from the constructor's side. However, the challenge is method dispatch with base objects that lack a class attribute. For S7 to be viable, method dispatch must be efficient.

I see no efficient way to support:


Another question to consider: What happens when a class_array is not explicitly declared but might be expected to appear implicitly?

Foo <- new_class(
  "Foo", 
  parent = class_double, 
  properties = list(dim = class_integer)
)

foo <- Foo(as.double(1:4), c(2, 2))

.class2(foo)          # "Foo" "double" "S7_object"
.class2(unclass(foo)) # "matrix" "array" "double" "numeric"

There is already an inconsistency with the implicit "numeric" class, where we intentionally omit dispatching on numeric.

Foo <- new_class("Foo", class_double)
foo <- Foo()
.class2(foo)          # "Foo" "double" "S7_object"
.class2(unclass(foo)) # "double" "numeric"

Perhaps, we continue with the existing precedent and intentionally limit support for the concept of implicit classes in S7, beyond some minimal support in method<- for compatability with S3.

E.g., we offer no default constructor for class_numeric or class_POSIXt, and perhaps, neither for class_array and class_matrix.

> new_class("Foo", class_POSIXt)()
Error:
! S3 class <POSIXt> doesn't have a constructor

> new_class("Foo", class_numeric)()
Error:
! `parent` must be an S7 class, S3 class, or base type, not an S7 union.

This could be via new_class("class_array", abstract = TRUE), or via some other, new mechanism.

Another idea is introducing a counterpart to new_union, called new_intersection. Like new_union, it wouldn't be usable in all contexts: The inverse of new_union(), it would valid in new_class() and new_property(), but not in method<-. An intersection could stack or combine validators (and obj_dispatch() entries), enabling use cases like this.

lawremi commented 1 month ago

It seems like we should at least merge the existing implicit classes from class() at construction time and have a dim<-.S7_object() that updates it. And merge .class2() during dispatch if we are not already.

In the general case, I think we'll start running into issues similar to those of multiple inheritance. How do we linearize the hierarchy with these implicit/dynamic classes?

One way to make implicit classes semi-efficient would be to figure out the classes upon class definition and upon each modification of the object, to the extent possible, alongside validation.

Definitely worth a discussion at our next meeting.

t-kalinowski commented 1 month ago

It's worth noting that it's not valid to have a dim attribute on a non-vector.

library(S7)
Foo <- new_class("Foo", properties = list(
  dim = new_property(NULL | class_integer)
))

foo <- Foo()

foo@dim <- 1:3
#> Error in `@<-.S7_object`(`*tmp*`, "dim", value = 1:3): invalid first argument, must be vector (list or atomic)
dim(foo) <- 1:3
#> Error in dim(foo) <- 1:3: invalid first argument, must be vector (list or atomic)

This error message comes from base R Rf_setAttrib() calling C function dimgets(), which has the check.

lawremi commented 1 month ago

That could be loosened to allow OBJSXP. There would be additional constraints though, like having a length() method.