RConsortium / S7

S7: a new OO system for R
https://rconsortium.github.io/S7
Other
404 stars 34 forks source link

When is validation performed and how to do you suppress it when needed? #25

Closed hadley closed 4 years ago

hadley commented 4 years ago

One strawman proposal for how to suppress temporarily:

eventually_valid <- function(code) {
  options(check_validity = FALSE)
  on.exit(options(check_validity = TRUE))

  validate(code)
}

Still need to consider when the OO system would call validate() automatically.

mmaechler commented 4 years ago

I'd strongly oppose to use a global options for something such fundamental. Switching the option on or off would switch R into two very different states wrt to these classes. Rather what Michael mentioned

     withoutValidity({
            .....
     })

which we'd implement on the C level (and mostly not part of the API, at least for the time being) so users cannot easily do this unless via the withoutValidity({ .... }) construct

hadley commented 4 years ago

@mmaechler even if you implemented in C, you'll still need some global state to turn validation on and off? Or are you imagining an interface more like this?

eventuallyValid <- function(object, code) {
  object$internal_validation_flag <- FALSE
  out <- code

  out$internal_validation_flag <- TRUE
  validate(out)
}

(I don't love the name withoutValidity() because I don't think it conveys that the final result will have been validated. I'm not tied to eventuallyValid() but I think the name should evoke the sense that you're just temporarily turning validation off.)

hadley commented 4 years ago

(This interface feels a little like clojure's transients. There the idea is to provide a way for functions to implement operations with mutable data structures for speed, while preserving an external api that is immutable. Here the idea is to provide functions a way to transiently move through invalid states and/or only perform validation once, while preserving an external api that only allows valid objects.)

hadley commented 4 years ago

Here's a little justification for the basic problem:

move_right <- function(x, y) {
  x@start <- x@start + y
  x@end <- x@end + y
  x
}

x <- Range(start = 0, end = 10)
move_right(x, 100)
# Error: start > end

move_right <- function(x, y) {
  eventuallyValid(x, 
    x@start <- x@start + y
    x@end <- x@end + y
    x
  )
}
move_right(x, 100)
# Range: 
#   @start: 100
#   @end: 110
ltierney commented 4 years ago

@mmaechler even if you implemented in C, you'll still need some global state to turn validation on and off? Or are you imagining an interface more like this?

eventuallyValid <- function(object, code) {
  object$internal_validation_flag <- FALSE
  out <- code

  out$internal_validation_flag <- TRUE
  validate(out)
}

(I don't love the name withoutValidity() because I don't think it conveys that the final result will have been validated. I'm not tied to eventuallyValid() but I think the name should evoke the sense that you're just temporarily turning validation off.)

Evaluating code wouldn't see the changed value of internal_validation_flag unless the object is mutable (in which case you would need to reset on non-local transfer of control). Something based on an action function like this might work:

eventuallyValid <- function(object, code) {
    object$internal_validation_flag <- FALSE
    out <- code(object)
    out$internal_validation_flag <- TRUE
    validate(out)
}
mmaechler commented 4 years ago

Definitely agree with eventuallyValid() being a better name, once you include "validation at the end"; and indeed, I agree that it should be the user facing interface to have "final validation".

@ltierney: Yes, if this is part of the object that I'm currently mutating, that's perfect. No objection at all to such an approach.

hadley commented 4 years ago

@ltierney yeah and that won't feel so clunky if we get lambdas.

The only other real option would be something with a pronoun, e.g.:

eventuallyValid(x, {
  .@start <- .@start + y
  .@end <- .@end + y
})

But that doesn't feel very base-R-like to me.

lawremi commented 4 years ago

I like the @ltierney approach.

ltierney commented 4 years ago

@ltierney yeah and that won't feel so clunky if we get lambdas.

We have lambdas. We always have had lambdas. What is under discussion is a shorthand notation for lambdas. (I know you know this, but a lot of those sounding off on twitter and such do not, and it is frustrating...)

The only other real option would be something with a pronoun, e.g.:

eventuallyValid(x, {
  .@start <- .@start + y
  .@end <- .@end + y
})

But that doesn't feel very base-R-like to me.

It isn't. And I am not a fan of any implicitly defined bindings.

Also, If there is any justice in the world you will some day be wearing progressive bifocals and cursing those who think using the smallest possible glyph as a variable name is a good idea :-)

hadley commented 4 years ago

@ltierney I'm hoping I'll have code directly beamed into my brain by that point 😆

It seems like there's consensus around your eventuallyValid() approach, so I think the other big question to consider is when validate() is run automatically. Presumably it'll be run automatically by the initialiser, do we also expect it'll be run after every @<- ?

mmaechler commented 4 years ago

Well, of course history and back compatibility should not be the main focus. OTOH, in S4 I (wrongly! see below) remember it was

slot(obj, "nm")  <-  value  # does validate
obj@<nm>       <- value   # does *NOT* validate

So, you'd use the latter when building up the object incrementally. But all the above is not quite true: 1) also slot(.,.) <- .. would not fully validate 2) even @<- validates the basic class of value So, S4 is quite subtle (and hence probably too complicated) here.

So, I'm not sure yet what to answer to your question, even though my initial reaction was that @<- should not fully validate...

But one thing maybe we should learn from S4: It may be wise to distinguish validity of the (class of) slot/property and validity of the complete object

lawremi commented 4 years ago

For simplicity, every modification should validate, with eventuallyValid() being the explicit exception.

hadley commented 4 years ago

As well as eventual validation, should we provide some way to declare that validation is unneeded? For example, take the move_right() function:

move_right <- function(x, y) {
  eventuallyValid(x, function(x) {
    x@start <- x@start + y
    x@end <- x@end + y
    x
  })
}

Strictly speaking, we don't actually need to re-validate because we can prove that if x@start < x@end then x@start + y < x@end + y.

lawremi commented 4 years ago

Right, we could have an implicitlyValid() for when performance matters (otherwise I'd recommend being defensive). Currently Bioconductor has an initialize2() that skips validation.

hadley commented 4 years ago

I like that name. I added a brief mention of it to my PR.

mmaechler commented 4 years ago

Strictly speaking, we don't actually need to re-validate because we can prove that if x@start < x@end then x@start + y < x@end + y.

No! Sorry to spoil: You are confusing pure mathematics with computer math: Assume x@start == 1 and x@end == 2, then x@start + y == x@end + y (i.e., equal !), as soon as
abs(y) > 2e16 , i.e. for many zillions of values y...

but of course there are other cases where implicitlyValid() is nicely applicable ...

hadley commented 4 years ago

Oh good point! That's a nice testament to why your analysis has to be very careful if you're going to use implicitlyValid().

hongooi73 commented 4 years ago

That example is particularly interesting from the security point of view as well. With C++, optimising compilers can generate code that assumes things like overflow never occur, which results in undefined behaviour if it does occur. Smart/unscrupulous people can exploit this to produce security holes.

Suppose, in this example, the programmer assumes x$start, x$end and y will all take "reasonable" values and uses implicitlyValid(). What would happen if an invalid value of y was passed in? I assume/hope this would throw an error at the R level before reaching compiled code?

lawremi commented 4 years ago

Yes, the code would need to defend against invalid inputs.

hongooi73 commented 4 years ago

@lawremi to clarify, do you mean the user's code would need to defend against this (which is basically the C and C++ philosophy)? Or the code that implements this object framework?

lawremi commented 4 years ago

The user. Whoever is claiming it is implicitly valid should be sure that it is.