Closed hadley closed 4 years ago
I'd strongly oppose to use a global options for something such fundamental. Switching the option on or off would switch R into two very different states wrt to these classes. Rather what Michael mentioned
withoutValidity({
.....
})
which we'd implement on the C level (and mostly not part of the API, at least for the time being) so users cannot easily do this unless via the withoutValidity({ .... })
construct
@mmaechler even if you implemented in C, you'll still need some global state to turn validation on and off? Or are you imagining an interface more like this?
eventuallyValid <- function(object, code) {
object$internal_validation_flag <- FALSE
out <- code
out$internal_validation_flag <- TRUE
validate(out)
}
(I don't love the name withoutValidity()
because I don't think it conveys that the final result will have been validated. I'm not tied to eventuallyValid()
but I think the name should evoke the sense that you're just temporarily turning validation off.)
(This interface feels a little like clojure's transients. There the idea is to provide a way for functions to implement operations with mutable data structures for speed, while preserving an external api that is immutable. Here the idea is to provide functions a way to transiently move through invalid states and/or only perform validation once, while preserving an external api that only allows valid objects.)
Here's a little justification for the basic problem:
move_right <- function(x, y) {
x@start <- x@start + y
x@end <- x@end + y
x
}
x <- Range(start = 0, end = 10)
move_right(x, 100)
# Error: start > end
move_right <- function(x, y) {
eventuallyValid(x,
x@start <- x@start + y
x@end <- x@end + y
x
)
}
move_right(x, 100)
# Range:
# @start: 100
# @end: 110
@mmaechler even if you implemented in C, you'll still need some global state to turn validation on and off? Or are you imagining an interface more like this?
eventuallyValid <- function(object, code) { object$internal_validation_flag <- FALSE out <- code out$internal_validation_flag <- TRUE validate(out) }
(I don't love the name
withoutValidity()
because I don't think it conveys that the final result will have been validated. I'm not tied toeventuallyValid()
but I think the name should evoke the sense that you're just temporarily turning validation off.)
Evaluating code
wouldn't see the changed value of internal_validation_flag
unless the object is mutable (in which case you would need to reset on non-local transfer of control). Something based on an action function like this might work:
eventuallyValid <- function(object, code) {
object$internal_validation_flag <- FALSE
out <- code(object)
out$internal_validation_flag <- TRUE
validate(out)
}
Definitely agree with eventuallyValid()
being a better name, once you include "validation at the end"; and indeed, I agree that it should be the user facing interface to have "final validation".
@ltierney: Yes, if this is part of the object that I'm currently mutating, that's perfect. No objection at all to such an approach.
@ltierney yeah and that won't feel so clunky if we get lambdas.
The only other real option would be something with a pronoun, e.g.:
eventuallyValid(x, {
.@start <- .@start + y
.@end <- .@end + y
})
But that doesn't feel very base-R-like to me.
I like the @ltierney approach.
@ltierney yeah and that won't feel so clunky if we get lambdas.
We have lambdas. We always have had lambdas. What is under discussion is a shorthand notation for lambdas. (I know you know this, but a lot of those sounding off on twitter and such do not, and it is frustrating...)
The only other real option would be something with a pronoun, e.g.:
eventuallyValid(x, { .@start <- .@start + y .@end <- .@end + y })
But that doesn't feel very base-R-like to me.
It isn't. And I am not a fan of any implicitly defined bindings.
Also, If there is any justice in the world you will some day be wearing progressive bifocals and cursing those who think using the smallest possible glyph as a variable name is a good idea :-)
@ltierney I'm hoping I'll have code directly beamed into my brain by that point 😆
It seems like there's consensus around your eventuallyValid()
approach, so I think the other big question to consider is when validate()
is run automatically. Presumably it'll be run automatically by the initialiser, do we also expect it'll be run after every @<-
?
Well, of course history and back compatibility should not be the main focus. OTOH, in S4 I (wrongly! see below) remember it was
slot(obj, "nm") <- value # does validate
obj@<nm> <- value # does *NOT* validate
So, you'd use the latter when building up the object incrementally.
But all the above is not quite true:
1) also slot(.,.) <- ..
would not fully validate
2) even @<-
validates the basic class of value
So, S4 is quite subtle (and hence probably too complicated) here.
So, I'm not sure yet what to answer to your question, even though my initial reaction was that @<-
should not fully validate...
But one thing maybe we should learn from S4: It may be wise to distinguish validity of the (class of) slot/property and validity of the complete object
For simplicity, every modification should validate, with eventuallyValid()
being the explicit exception.
As well as eventual validation, should we provide some way to declare that validation is unneeded? For example, take the move_right()
function:
move_right <- function(x, y) {
eventuallyValid(x, function(x) {
x@start <- x@start + y
x@end <- x@end + y
x
})
}
Strictly speaking, we don't actually need to re-validate because we can prove that if x@start < x@end
then x@start + y < x@end + y
.
Right, we could have an implicitlyValid()
for when performance matters (otherwise I'd recommend being defensive). Currently Bioconductor has an initialize2()
that skips validation.
I like that name. I added a brief mention of it to my PR.
Strictly speaking, we don't actually need to re-validate because we can prove that if
x@start < x@end
thenx@start + y < x@end + y
.
No! Sorry to spoil: You are confusing pure mathematics with computer math:
Assume x@start == 1
and x@end == 2
, then x@start + y == x@end + y
(i.e., equal !), as soon as
abs(y) > 2e16
, i.e. for many zillions of values y
...
but of course there are other cases where implicitlyValid()
is nicely applicable ...
Oh good point! That's a nice testament to why your analysis has to be very careful if you're going to use implicitlyValid()
.
That example is particularly interesting from the security point of view as well. With C++, optimising compilers can generate code that assumes things like overflow never occur, which results in undefined behaviour if it does occur. Smart/unscrupulous people can exploit this to produce security holes.
Suppose, in this example, the programmer assumes x$start
, x$end
and y
will all take "reasonable" values and uses implicitlyValid()
. What would happen if an invalid value of y
was passed in? I assume/hope this would throw an error at the R level before reaching compiled code?
Yes, the code would need to defend against invalid inputs.
@lawremi to clarify, do you mean the user's code would need to defend against this (which is basically the C and C++ philosophy)? Or the code that implements this object framework?
The user. Whoever is claiming it is implicitly valid should be sure that it is.
One strawman proposal for how to suppress temporarily:
Still need to consider when the OO system would call
validate()
automatically.