Closed LittleBeannie closed 4 months ago
Specifically the goal is to make the check below more robust:
The function may be saved to a file (eg by the Shiny app) and loaded later, so it's environment would be different, and thus evaluate to FALSE
. For example:
gs_b
## function(par = NULL, k = NULL, ...) {
## if (is.null(k)) {
## return(par)
## } else {
## return(par[k])
## }
## }
## <environment: namespace:gsDesign2>
z <- function(par = NULL, k = NULL, ...) {
if (is.null(k)) {
return(par)
} else {
return(par[k])
}
}
identical(z, gs_b)
## [1] FALSE
We could remove the environment by comparing args()
and body()
, but that would cause problems if we updated the body function (eg to make it faster or more readable) since the version of the function saved in a file would not have been updated.
In any case, body()
only works well for very simple functions. For gs_b()
, environments end up getting included.
# from ?body
f <- function(x) x^5
body(f)
## x^5
str(body(f)) # no environemnts
## language x^5
body(gs_b)
## {
## if (is.null(k)) {
## return(par)
## }
## else {
## return(par[k])
## }
## }
str(body(gs_b)) # has envs embedded
## language { if (is.null(k)) {; return(par); }; else {; return(par[k]); } }
## - attr(*, "srcref")=List of 2
## ..$ : 'srcref' int [1:8] 71 45 71 45 45 45 4368 4368
## .. ..- attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' <environment: 0x00000286507149f8>
## ..$ : 'srcref' int [1:8] 72 3 76 3 3 3 4369 4373
## .. ..- attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' <environment: 0x00000286507149f8>
## - attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' <environment: 0x00000286507149f8>
## - attr(*, "wholeSrcref")= 'srcref' int [1:8] 1 0 77 1 0 1 1 4374
## ..- attr(*, "srcfile")=Classes 'srcfilealias', 'srcfile' <environment: 0x00000286507149f8>
Though we could wrap it in capture.output()
, which works, but only if the function is never updated in {gsDesign2}.
identical(args(z), args(gs_b)) &&
identical(capture.output(body(z)), capture.output(body(gs_b)))
## [1] TRUE
Instead of comparing functions, I'd suggest comparing returned values of functions instead.
I agree with @yihui that comparing returned values would be the most robust solution.
A major problem with that is that the function interface is not uniform. For gs_b()
, the argument par
is a numeric vector, but for the example in the test file, par
is a list.
args(gs_b)
## function (par = NULL, k = NULL, ...)
## NULL
args(x$input$lower)
## function (k = 1, par = list(sf = gsDesign::sfLDOF, total_spend = 0.025,
## param = NULL, timing = NULL, max_info = NULL), hgm1 = NULL,
## theta = 0.1, info = 1:3, efficacy = TRUE, test_bound = TRUE,
## r = 18, tol = 1e-06)
## NULL
gs_b(par = 4:2, k = 2)
## [1] 3
x$input$lower(par = 4:2, k = 2)
## Error in par$timing : $ operator is invalid for atomic vectors
Is there documentation somewhere of how to create a valid function for computing the lower bound? The documentation of the argument lower
for gs_design_ahr()
is minimal:
However, the default value is gs_spending_bound
:
And it also has par
as a list:
args(gs_spending_bound)
## function (k = 1, par = list(sf = gsDesign::sfLDOF, total_spend = 0.025,
## param = NULL, timing = NULL, max_info = NULL), hgm1 = NULL,
## theta = 0.1, info = 1:3, efficacy = TRUE, test_bound = TRUE,
## r = 18, tol = 1e-06)
## NULL
So I assume that gs_b()
is the outlier
Is it a good idea if we save around 5-10 minutes to discuss it this Friday?
Is it a good idea if we save around 5-10 minutes to discuss it this Friday?
Yes, I think it would be a good discussion topic
I didn't realize this was such a pervasive pattern throughout the package, eg
My fix in #413 was specific to comparing to gs_b()
.
I can image creating a function that is more flexible, eg is_same_func(f1, f2, input = list(<args>)
, that we could use to compare any two given functions. However, I'm hesitant to do this due to the reduction in code readability. How big of a concern is this really? How often will a user (or the Shiny app) pass a function that is effectively identical when it is much simpler to just use the default argument?
Yeah, it's quite prevalent. I can't quantify the size of the concern but the risk here is using a unreliable comparison mechanism that could return false negative results and generate wrong design results without the user knowing due to factors you can't control - such as reusing serialized objects from before. Would that be a huge issue? I don't know 🤔
I guess I see no better simple alternatives but using the attribute-based solution suggested by @yihui, a sketch:
add_identifier <- function(f, id) {
attr(f, "id") <- id
f
}
is_gs_b <- function(f) {
id <- attr(f, "id")
if (is.null(id)) {
return(FALSE)
} else {
id == "gs_b"
}
}
gs_b_id <- add_identifier(gsDesign2::gs_b, "gs_b")
is_gs_b(gsDesign2::gs_b)
is_gs_b(gs_b_id)
If we were to start from the beginning, I'd make upper
and lower
strings by default (instead of functions), which would be easier to store and more robust to compare. They can still accept function input, but the responsibility would be on the user end, i.e., they have to make sure they input an identical function each time. I'm not sure how common it would be for users to need to input custom functions.
Apparently, we can't start from the beginning now, and have to consider backward compatibility. My suggestion is that we change our default to strings (e.g., 'b'
, 'spending_bound'
, 'spending_combo'
---feel free to use more meaningful and intuitive names). Then for those comparisons, we first check if the value is character and test for equality. If the value is function, we fall back to the current identical(., .)
approach. I'm not sure if that makes sense.
I guess I see no better simple alternatives but using the attribute-based solution suggested
This still seems too complex to me. And I feel it only kicks the can down the road. What if a user passes their own custom function? If they don't add this id
attribute, then the code still won't handle it properly.
My suggestion is that we change our default to strings
I am leaning towards this solution. I think it makes sense to directly pass functions when a user is free to create and pass their own custom functions. We use this feature to great effect for the cut and test functions in {simtrial}.
However, in {gsDesign2}, there is no possibility of providing a user-defined spending function and obtaining a reliable result. The code logic only updates the boundaries if gs_b
or gs_spending_bound
is provided, eg
If we still want to support the potential of users providing their own functions instead of using gs_b()
or gs_spending_bound()
, I propose adding a string argument that specifies the type of boundary function. I don't understand the domain well enough to choose proper names, but I'm imaging arguments like:
upper_type = "dynamic",
upper = gs_spending_bound,
upar = list(sf = gsDesign::sfLDOF, total_spend = alpha),
lower_type = "dynamic",
lower = gs_spending_bound,
lpar = list(sf = gsDesign::sfLDOF, total_spend = beta),
And then the code logic would be:
# Updated lpar
if (identical(x$input$lower_type, "static")) {
lpar_new <- x$input$lpar
} else if (identical(x$input$lower, "dynamic")) {
lpar_new <- x$input$lpar
if (!("timing" %in% names(x$input$lpar))) {
lpar_new$timing <- upar_new$timing
}
} else {
stop('lower_type must be either "static" or "dynamic"')
}
I agree with @jdblischak. I don't know the domain enough, either, so domain experts will have to make the call.
Following discussion in https://github.com/Merck/gsDesign2/pull/408.