RConsortium / S7

S7: a new OO system for R
https://rconsortium.github.io/S7
Other
386 stars 32 forks source link

rewrite `prop<-()` in C #396

Closed t-kalinowski closed 2 months ago

t-kalinowski commented 8 months ago

This PR moves prop<- to C. As part of this, there are some changes in behavior that are worth highlighting.

t-kalinowski commented 8 months ago

This is a draft implementation of prop<- in C. It is failing to build on older R versions, and isn't particularly tidy yet.

There are some open questions about what "should" happen when a custom property setter is provided that merit discussion.

  1. When, and how often, should validate() be called from prop<- in the case of a custom property setter()?
  2. Are there any circumstances where a prop<-() call within a custom setter() should invoke another custom setter()? E.g., when setting a different property name, or setting a property of the same name on a different object (i.e., not self). Is prop<- allowed to recursively call a custom setter() under any circumstances?

For (1), there is currently a snapshot test that looks like this:

library(S7)
# snapshot title:  "validates once after custom setter"
custom_setter <- function(self, value) {
  self@x <- as.double(value)
  self
}
foo2 <- new_class(
  "foo2",
  properties = list(x = new_property(class_double, setter = custom_setter)),
  validator = function(self) {
    print("validating")
    character()
  }
)

obj <- foo2("123")
#> [1] "validating"

obj@x <- "456"

Note that the last expression, obj@x <- "456", did not invoke the validator. This seems inconsistent with the docs.

Is this a bug in the current implementation of prop<-, or should the docs be updated to clarify that a custom property setter is responsible for make sure the returned object is valid?

For (2), I've added a snapshot test for prop<- that fails with the current R implementation of prop<- on the main branch. The draft implementation in C (this PR) doesn't infinitely recurse, but it's not clear to me that this current behavior is correct either.

library(S7)
chattily_sync_ab <- function(self, value) {
  cat("Starting syncup with value:", value, "\n")
  a_value <- paste0("a_", value)
  b_value <- paste0("b_", value)

  cat(sprintf('setting @a <- "%s"\n', a_value))
  self@a <- a_value

  cat(sprintf('setting @b <- "%s"\n', b_value))
  self@b <- b_value

  self
}

foo <- new_class("foo", properties = list(
  a = new_property(setter = chattily_sync_ab),
  b = new_property(setter = chattily_sync_ab)
))

obj <- foo()
#> Starting syncup with value: 
#> setting @a <- "a_"
#> setting @b <- "b_"
#> Starting syncup with value: b_ 
#> setting @a <- "a_b_"
#> setting @b <- "b_b_"
#> Starting syncup with value: 
#> setting @a <- "a_"
#> Starting syncup with value: a_ 
#> setting @a <- "a_a_"
#> setting @b <- "b_a_"
#> setting @b <- "b_"

obj@a <- "val"
#> Starting syncup with value: val 
#> setting @a <- "a_val"
#> setting @b <- "b_val"
#> Starting syncup with value: b_val 
#> setting @a <- "a_b_val"
#> setting @b <- "b_b_val"

We should also clarify when validate() is supposed to be called in (2) as well.

(3) (bonus). The edge case of a custom setter setting a property on something other than self also probably merits some discussion.

My current intuition for how to avoid infinite recursion but still allow a setter() to implicitly call setter() on other objects or properties is to attach an attribute to object with a (growable) list of property names to avoid recursively calling custom setters on. I.e., temporarily mark the object and property as non-active-setter-callable, a more narrow cousin to the .should_validate attr we attach with validate_eventually().

Something like this (but w/ append/pop semantics):

attr(object, "dont_call_prop_setter") <- prop_name
setter(object, value)
attr(object, "dont_call_prop_setter") <- NULL
t-kalinowski commented 8 months ago

Some benchmarks. The new prop<- C implementation is faster in all situations, but the magnitude of the speedup varies. The total time is dominated by prop_validate() and validate(), which are still implemented in R. Moving prop_validate() to C would be the next step.

library(S7)
`propr<-` <- S7:::`propr<-`
`prop<-` <- S7::`prop<-`

Simplest case

foo <- new_class("foo", properties = list(xyz = class_double))
obj <- foo(123)
print(plot(print(df <- bench::mark(
  c = prop(obj, "xyz") <- 123,
  r = propr(obj, "xyz") <- 123,
  c_no_val = prop(obj, "xyz", check = FALSE) <- 123,
  r_no_val = propr(obj, "xyz", check = FALSE) <- 123
))))
#> # A tibble: 4 × 13
#>   expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>
#> 1 c            14.1µs  15.38µs    62945.    4.07KB     37.8  9994     6
#> 2 r           17.47µs  18.33µs    53961.    6.28KB     37.8  9993     7
#> 3 c_no_val   738.01ns 820.03ns  1122887.        0B    112.   9999     1
#> 4 r_no_val     3.77µs   4.14µs   236966.        0B     23.7  9999     1
#> # ℹ 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> #   time <list>, gc <list>
#> Loading required namespace: tidyr

Simple custom setter

foo2 <- new_class("foo2", properties = list(
  x = new_property(
    class_double,
    setter = function(self, value) {
      self@x <- as.double(value)
      self
      })
  ))

obj <- foo2("123")
print(plot(print(df <- bench::mark(
  c = prop(obj, "x") <- 456,
  r = propr(obj, "x") <- 456,
  c_no_val = prop(obj, "x", check = FALSE) <- 456,
  r_no_val = propr(obj, "x", check = FALSE) <- 456
))))
#> # A tibble: 4 × 13
#>   expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 c          16.65µs 17.71µs    54935.    22.2KB     27.5  9995     5    181.9ms
#> 2 r          22.39µs 23.66µs    41859.        0B     29.3  9993     7    238.7ms
#> 3 c_no_val    6.19µs  6.85µs   141648.        0B     28.3  9998     2     70.6ms
#> 4 r_no_val   22.26µs 23.82µs    41316.        0B     28.9  9993     7    241.9ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Many (100) attributes

foo3 <- new_class("foo3", properties =
  lapply(1:100, \(n) new_property(name = paste0("prop_", n), class_double)),
)

obj <- foo3(123)
print(plot(print(df <- bench::mark(
  c = prop(obj, "prop_50") <- 456,
  r = propr(obj, "prop_50") <- 456,
  c_no_val = prop(obj, "prop_50", check = FALSE) <- 456,
  r_no_val = propr(obj, "prop_50", check = FALSE) <- 456
))))
#> # A tibble: 4 × 13
#>   expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 c          17.18µs 18.08µs    53846.        0B     37.7  9993     7    185.6ms
#> 2 r          19.15µs 20.01µs    48047.        0B     28.8  9994     6      208ms
#> 3 c_no_val    2.09µs  2.54µs   381440.        0B     38.1  9999     1     26.2ms
#> 4 r_no_val    5.29µs  6.11µs   154421.        0B     30.9  9998     2     64.7ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Recursive custom property setters()

foo4 <- new_class("foo4",
  properties = list(
    x = new_property(setter = function(self, value) {
      prop(self, "x") <- 1
      prop(self, "y") <- value + 1
      self
    }),
    y = new_property(setter = function(self, value) {
      prop(self, "y") <- 2
      prop(self, "z") <- as.integer(value + 1)
      self
    }),
    z = new_property(class_integer)
  )
  # validator = function(self) {NULL}
)

obj <- foo4(123)
print(plot(print(df <- bench::mark(
  c = prop(obj, "x") <- 456,
  r = propr(obj, "x") <- 456,
  c_no_val = prop(obj, "x", check = FALSE) <- 456,
  r_no_val = propr(obj, "x", check = FALSE) <- 456
))))
#> # A tibble: 4 × 13
#>   expression      min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:tm> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 c            22.7µs 23.9µs    41038.    29.8KB     28.7  9993     7      244ms
#> 2 r            45.7µs   48µs    20464.        0B     29.8  9620    14      470ms
#> 3 c_no_val       12µs 12.9µs    75847.        0B     30.4  9996     4      132ms
#> 4 r_no_val     45.6µs 48.6µs    20057.        0B     29.4  9557    14      476ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Recursive custom property setters(), without validation

## same as foo4 (recursive setters), but internal prop<- calls have check=FALSE
foo4.1 <- new_class("foo4.1",
  properties = list(
    x = new_property(setter = function(self, value) {
      prop(self, "x", check = FALSE) <- 1
      prop(self, "y", check = FALSE) <- value + 1
      self
    }),
    y = new_property(setter = function(self, value) {
      prop(self, "y", check = FALSE) <- 2
      prop(self, "z", check = FALSE) <- as.integer(value + 1)
      self
    }),
    z = new_property(class_integer)
  )
)
obj <- foo4.1(123)

print(plot(print(df <- bench::mark(
  c_no_val = prop(obj, "x", check = FALSE) <- 456,
  r_no_val = propr(obj, "x", check = FALSE) <- 456
))))
#> # A tibble: 2 × 13
#>   expression     min  median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
#>   <bch:expr> <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
#> 1 c_no_val     4.1µs  4.55µs   214221.    30.2KB     21.4  9999     1     46.7ms
#> 2 r_no_val    11.1µs 11.93µs    82421.        0B     33.0  9996     4    121.3ms
#> # ℹ 4 more variables: result <list>, memory <list>, time <list>, gc <list>

Created on 2023-12-19 with reprex v2.0.2

hadley commented 8 months ago

Looks like a nice set of speedups!

t-kalinowski commented 6 months ago

@DavisVaughan (or others), do you have thoughts on the usage of Rf_shallow_duplicate() vs Rf_duplicate() here? (Is it safe to use Rf_shallow_duplicate() here? Any gotchas I may be missing?)

DavisVaughan commented 6 months ago

Rf_shallow_duplicate() seems fine. IIRC it is mostly useful for lists or objects with lots of attributes where you want to duplicate the "core" object but you don't want to duplicate the list elements or attributes (like cloning a data frame!). Its the deep = false path of duplicate1(): https://github.com/wch/r-source/blob/25095948dbff496df29561218b4f7b5e9265c8b2/src/main/duplicate.c#L276

Note that Rf_shallow_duplicate() still makes a full copy of, say, an integer vector, if it is the core object.

R_shallow_duplicate_attr() might also be worth looking at. It will make a lightweight ALTREP wrapper over the original object where the ALTREP wrapper is allowed to have a different set of attributes from the original, but otherwise has the same "core" object (i.e. it won't copy the integer vector in my example above). It looks like it may actually make sense to use it here, since you are just changing attributes? https://github.com/wch/r-source/blob/25095948dbff496df29561218b4f7b5e9265c8b2/src/main/duplicate.c#L594C6-L594C30

Note that the object itself has to be "large enough" before an altrep wrapper is considered, otherwise its not worth it: https://github.com/wch/r-source/blob/25095948dbff496df29561218b4f7b5e9265c8b2/src/main/duplicate.c#L576

t-kalinowski commented 2 months ago

Thanks @DavisVaughan, I agree that R_shallow_duplicate_attr seems like a good choice here. However, it is not (yet?) part of the official API (at least, it's not listed here: https://yutannihilation.github.io/R-fun-API/). I've added a comment as a reminder to update usage once R_shallow_duplicate_attr() is safe to use.

I think this PR is ready to merge.

hadley commented 2 months ago

I'm working on the build failures in #407

hadley commented 2 months ago

@t-kalinowski this warrants a news bullets I think, and then feel free to merge.