RConsortium / S7

S7: a new OO system for R
https://rconsortium.github.io/S7
Other
386 stars 32 forks source link

Automatic Validation of Properties using Checkmate #398

Closed jmiahjones closed 2 months ago

jmiahjones commented 7 months ago

I just learned about S7, and I was really excited! I've sketched out some code for an idea about property validation, but I wanted to start a discussion here. Some issues may be somewhat related, like #251 or #393.

Perceived Problem

I've been using R6 classes, but my annoyance is that you have a lot of boilerplate to define the property, specify it as a function argument, and then use the self$x <- x pattern. That means there are many places to define properties. Necessarily, if your object has many properties then validation occurs far away from the property.

S7 has solved the first problem, but I still feel a similar problem from number 2. Consider this bit of code:

has_bool <- new_class(
  "has_bool",
  properties = list(
    bool = class_logical
  ),
  validator = function(self){
    if(length(self@bool) > 1) {
      "@bool must be a single logical value"
    }
  })

When I define bool, I would ideally like it to be known that bool is a length one, non-missing logical. This is a toy example, but you can imagine that if the properties list gets rather large, there would be a lot of space between the validator for the property and the property itself. It seems like describing the basic expectations for the property where it is defined is a desirable defensive programming practice.

Of course, there's a couple of other solutions. You could define a new property type that handles this, or just use the validation function in the property. These implementations use the excellent checkmate package:

library(checkmate)
has_bool2 <- new_class(
  "has_bool2",
  properties = list(
    bool = class_bool(len=1)
  ))

has_bool3 <- new_class(
  "has_bool3",
  properties = list(
    bool = new_property(
      class = class_logical,
      validator = \(value) {
        res <- check_logical(
          value,
          len = 1
        )
        if(isTRUE(res)){
          return(NULL)
        } else {
          return(res)
        }
      }
    )
  ))

These two are essentially the same as class_bool constructs the property used in implementation 3. The downside to this, is that a potentially very large number of these property factories must be created.

The Proposal

My personal preference is to define "expectations" for the property directly in the property constructor itself. Thinking through this, I realized that many of these "expectations" are implemented as arguments to functions in the checkmate package. That led me to sketch out this new_checkmate_property function that essentially constructs the type of validator above by passing ... to a check function:

has_bool4 <- new_class(
  "has_bool4",
  properties = list(
    bool = new_checkmate_property(
      class = class_logical,
      validator = check_logical,
      len = 1
    )
  ))

In new_checkmate_property, the validator argument corresponds to a checkmate::check function that is used to validate the property, and len = 1 is parsed as a ... to be passed to it. This lets the user describe the exact expectations for the property inputs, and the failure gets reported very quickly with an automatically-generated and descriptive error message:

> h4 <- has_bool4(bool=T) # no error

> h4_fail <- has_bool4(c(T,T))
Error: <has_bool4> object properties are invalid:
- @bool Must have length 1, but has length 2
> h4_fail <- has_bool4(22.0)
Error: <has_bool4> object properties are invalid:
- @bool must be <logical>, not <double>

More complex and inter-related validation can still take place in the validate argument for the class itself. I think has_bool4 is slightly more verbose than has_bool2, but each line of the bool definition serves an explicit purpose. Perhaps the only thing that could be changed is to tie the check function to the class type, but this method allows more freedom to do validation like the following:

has_switch <- new_class(
  "has_switch",
  properties = list(
    switch = new_checkmate_property(
      class = class_character,
      validator = check_choice,
      choices = letters[1:4]
    )
  ))
> has_switch("e")
Error: <has_switch> object properties are invalid:
- @switch Must be element of set {'a','b','c','d'}, but is 'e'

Downsides

I wanted to get feedback on this idea. One of the downsides is my reliance on the checkmate package. I noticed that dependencies are kept out, presumably to make S7 a language feature. Perhaps this is some syntactic sugar that belongs in its own package outside of S7. I'd appreciate any thoughts!

lawremi commented 7 months ago

Nice work. I agree that this would be great to have as a separate helper package.

jmiahjones commented 7 months ago

Thanks, @lawremi. If that is the route this goes, that would be fine. I do want to make sure there's nothing else on the roadmap to re-work property creation before I start creating a package.

hadley commented 7 months ago

Yeah, this looks like it would be a good fit for an external package to me too. Maybe it could go it checkmate itself? I think checkmate would only need to take a suggested dependency on S7.

jmiahjones commented 7 months ago

Alternatively, @hadley and @lawremi what about having a suggested dependency on checkmate here, with this simple wrapper unlocked by the package?

hadley commented 7 months ago

@jmiahjones I don't think we'd want to "bless" checkmate in that way without an extensive comparison of the options.

tdeenes commented 7 months ago

@jmiahjones I don't think we'd want to "bless" checkmate in that way without an extensive comparison of the options.

I would rather argue that this is not about "blessing" any package. S7 should be as lean as possible without introducing any dependencies (even suggested ones) over the ones which are strictly necessary to the core functionality of the pkg or to unit tests. To my taste even the current list of suggested dependencies is a bit too long. As for validations, S7 already provides the basics, and it can be extended as this PR demonstrates (as far as I see it does not rely on any internals that an extension package could not access).

Disclaimer: I am a contributor to checkmate.

hadley commented 7 months ago

@tdeenes obviously we share different opinions about suggested dependencies which primarily affect developers. You are welcome to try suggest PRs that remove if you think they are not useful, but they are mostly there to generate better looking vignettes which are valuable, in my opinion.

hadley commented 2 months ago

Closing this because it looks like our discussion has come to an end. I do think this is a valuable contribution, but I don't think it belongs in S7 itself.