Closed LittleBeannie closed 3 months ago
The problem is related to passing the arguments to create_cut()
as variables:
ia_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[1])
fa_cut <- simtrial::create_cut(planned_calendar_time = x$analysis$time[2])
When I replace them with their actual value, it works:
ia_cut <- simtrial::create_cut(planned_calendar_time = 23.82414)
fa_cut <- simtrial::create_cut(planned_calendar_time = 35.85303)
I believe this has to do with the complex feature of create_cut()
, in that it is delaying evaluation of its arguments until the function is creates is executed. My initial idea was to use force()
, but that didn't work. I'll keep investigating.
ia_cut <- simtrial::create_cut(planned_calendar_time = force(x$analysis$time[1]))
fa_cut <- simtrial::create_cut(planned_calendar_time = force(x$analysis$time[2]))
I believe this is due to local/global issues with how foreach and doFuture work together. When foreach uses a parallel evaluation, global variables are not always accessible due to the instances that doFuture creates. This issue can be avoided if we explicitly pass the cut function to each instance of the loop. We currently use this for the test
variable as well.
I was using force()
incorrectly above. It needs to be called inside the function (source: Advanced R v1 chp 6 on functions). Updating create_cut()
to force its arguments appears to fix the problem, but I'm still testing:
create_cut <- function(...) {
lapply(X = list(...), FUN = force)
function(data) {
get_analysis_date(data, ...)
}
}
I have tested the updated create_cut()
and am in alignment with this change as fixing the issue.
@jdblischak I did a functional test by running Yujie's example code above using the current simtrial version in the main branch and saw the same error reported by macOS runners in #262 revealed by tests.
remotes::install_version("gsDesign", "3.6.3")
remotes::install_version("gsDesign2", "1.1.2")
remotes::install_github("Merck/simtrial@c037f8f")
> future::plan("multisession", workers = 8)
> simtrial::sim_gs_n(
+ n_sim = 1e2,
+ sample_size = x$analysis$n[2],
+ enroll_rate = x$enroll_rate,
+ fail_rate = x$fail_rate,
+ test = simtrial::wlr,
+ cut = list(ia = ia_cut, fa = fa_cut),
+ weight = simtrial::fh(rho = 0, gamma = 0)
+ )
Using 8 cores with backend multisession
Error in `[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate))) :
Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
I did a functional test by running Yujie's example code above using the current simtrial version in the main branch and saw the same error reported by macOS runners in https://github.com/Merck/simtrial/pull/262 revealed by tests.
Thanks @nanxstats for confirming! While it's frustrating that this is still broken for macOS users, the good news is that the test I added in #262 appears to be identifying an actual problem that will affect end users.
As an experiment, could you please re-run your test above but install {simtrial} from 23449e00e53af89330ec63d4b3b807fe41129311, ie prior to #261? I'm curious if you'll receive the {data.table} error or the object 'x' not found
error.
@jdblischak Yep, the same data.table error already exists under https://github.com/Merck/simtrial/commit/23449e00e53af89330ec63d4b3b807fe41129311, under macOS, for both sequential and parallel:
remotes::install_github("Merck/simtrial@23449e0")
Using 8 cores with backend multisession
Error in `[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate))) :
Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
In addition: There were 33 warnings (use warnings() to see them)
Backend uses sequential processing.
Loading required package: foreach
Loading required package: future
Error in `[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate))) :
Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
It looks like different issues were triggered when the random seed is different. In the beginning of the example, use a set.seed(42)
gives the object 'x' not found
error. Use a set.seed(44)
gives the data.table error.
set.seed(42)
Using 8 cores with backend multisession
Error in eval(expr, p) : object 'x' not found
In addition: There were 36 warnings (use warnings() to see them)
set.seed(44)
Using 8 cores with backend multisession
Error in `[.data.table`(x, , `:=`(enroll_time, rpwexp_enroll(n, enroll_rate))) :
Supplied 454 items to be assigned to 453 items of column 'enroll_time'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.
In addition: There were 41 warnings (use warnings() to see them)
Yep, the same data.table error already exists under https://github.com/Merck/simtrial/commit/23449e00e53af89330ec63d4b3b807fe41129311, under macOS, for both sequential and parallel:
Thanks for confirming. That is a useful clue. This {data.table} error was not introduced by #261, but instead has always existed.
It looks like different issues were triggered when the random seed is different. In the beginning of the example, use a
set.seed(42)
gives theobject 'x' not found
error. Use aset.seed(44)
gives the data.table error.
That's not a good sign for troubleshooting. If there is some sort of race condition between the two errors, this is going to be hard to isolate, fix, and test.
The following lines of codes gives me an error message when I run it parallelly. However, the sequential processing is error-free.
Following is the error message.
Hello @jdblischak and @cmansch , as both of you are the developers of
sim_gs_n
, I am including both of you in this discussion.