Merck / simtrial

Clinical trial simulation for time-to-event endpoints
https://merck.github.io/simtrial/
GNU General Public License v3.0
18 stars 8 forks source link

Standardize the output format of the test functions #222

Closed jdblischak closed 4 months ago

jdblischak commented 7 months ago

We have ambitious goals to make experiments with different test functions extremely flexible. For example:

Currently this flexibility isn't possible. While the test functions all accept a data frame of trial data as their first positional argument, the outputs are not uniform. This makes it impossible to combine the results into a data frame to facilitate downstream analysis. I also don't understand how the maxcombo() function can be extended to accept any number of arbitrary tests without a common test statistic to compare them (though I admit I may just not fully understand the future plans for maxcombo()).

The example below demonstrates the diversity of the outputs:

remotes::install_github("Merck/simtrial@2c015f6", upgrade = FALSE)
library("simtrial")
packageVersion("simtrial")
## [1] ‘0.3.2.13’

trial_data <- sim_pw_surv()
trial_data_cut <- cut_data_by_event(trial_data, 150)

wlr(trial_data_cut, weight = fh(rho = 0, gamma = 0))
##   rho gamma         z
## 1   0     0 -1.392586
wlr(trial_data_cut, weight = fh(rho = 0, gamma = 0.5))
##   rho gamma         z
## 1   0   0.5 -1.650691
wlr(trial_data_cut, weight = mb(delay = 3))
##           z
## 1 -1.425919
rmst(trial_data_cut, tau = 20)
##   rmst_arm1 rmst_arm0 rmst_diff         z
## 1  11.16504  10.20869 0.9563463 0.6412476
milestone(trial_data_cut, ms_time = 10)
##      method          z ms_time surv_ctrl surv_exp surv_diff std_err_ctrl std_err_exp
## 1 milestone 0.04013879      10      0.46     0.48      0.02   0.07048404  0.07065409
maxcombo(trial_data_cut, rho = c(0, 0), gamma = c(0, 0.5))
##      p_value
## 1 0.06348763

I propose that each test function returns a named vector with 3 elements:

If this is too simplistic, we could include more information (eg the input parameters) by either:

xref: #215

keaven commented 7 months ago

First of all: very nice! One immediate comment: a milestone difference between .48 and .46 with the given standard errors should not have a p-value of .04 unless I misunderstand something. This is also an explanation of the detail that is different for different tests is, in my mind, essential. I would probably be fine with a description, p-value and a list (if a data.frame column can have type list???) that is dependent on the description. The list for any given description should be done consistently. It could be a mix of parameters and descriptive or other statistics. The OOP option seems potentially interesting, but I would like to see the above example done with that. You are right that we are asking for a lot of flexibility...and probably no matter what we do, some statistician(s) will want something that does not fit the model.

jdblischak commented 7 months ago

One immediate comment: a milestone difference between .48 and .46 with the given standard errors should not have a p-value of .04 unless I misunderstand something.

I think the .04 is the z-score, not the p-value

This is also an explanation of the detail that is different for different tests is, in my mind, essential. I would probably be fine with a description, p-value and a list (if a data.frame column can have type list???) that is dependent on the description.

I think it should be possible. I'm not a big fan of list-columns in general (since they make things more complicated), but it would be a way to store this heterogenous data.

The list for any given description should be done consistently. It could be a mix of parameters and descriptive or other statistics.

Exactly. We can use this Issue to discuss the various descriptions/parameters/statistcs/etc that we need returned from each test function and decide on a uniform output format.

nanxstats commented 7 months ago

100% agree with the OOP/S3 method comment - that would provide the interface for the returned data (implementation), as described in the principle code against interface, not implementation.

jdblischak commented 7 months ago

After discussion, we are planning to try returning a named vector with the following values. If this becomes too unmanageable, then we can consider trying more sophisticated options like S3 or list-columns.

@LittleBeannie please feel free to adjust these names as you see fit. They are just my suggestions

LittleBeannie commented 7 months ago

After discussion, we are planning to try returning a named vector with the following values. If this becomes too unmanageable, then we can consider trying more sophisticated options like S3 or list-columns.

  • test_type - the name of the test: wlr, rmst, maxcombo, etc
  • estimate - point estimate
  • se - standard error
  • z - z-score
  • p - p-value
  • parameters - string of parameter values, eg "rho=0;gamma=0.5"

@LittleBeannie please feel free to adjust these names as you see fit. They are just my suggestions

Hi John, looks great to me, thanks for the nice summary! Can I raise some minor comments to the names?

jdblischak commented 7 months ago

Can I raise some minor comments to the names

Those names are all fine by me. Like I said, mine were only suggestions

nanxstats commented 7 months ago

Good discussions, how about this - no abbreviation at all to make them easy to search:

This also aligns with the general rule of thumb to prefer long, descriptive names suggested by R4DS.

nphsim consistency is a good topic, although I don't think it's that important because of its experimental nature... and we should take its code style with a grain of salt.

jdblischak commented 7 months ago

Here are the outputs of the testing functions now that #227 was merged (75d7b67607a484cc23489fccad669d8d51005ec6). I am going to work on extending sim_gs_n() to be more flexible with wlr(), rmst(), and milestone()

remotes::install_github("Merck/simtrial@2c015f6", upgrade = FALSE)
library("simtrial")
packageVersion("simtrial")
## [1] ‘0.3.2.14’

trial_data <- sim_pw_surv()
trial_data_cut <- cut_data_by_event(trial_data, 150)

wlr(trial_data_cut, weight = fh(rho = 0, gamma = 0))
## $method
## [1] "WLR"
##
## $parameter
## [1] "FH(rho=0, gamma=0)"
##
## $estimation
## [1] -12.3417
##
## $se
## [1] 4.486425
##
## $z
## [1] -2.750898
##
wlr(trial_data_cut, weight = fh(rho = 0, gamma = 0.5))
## $method
## [1] "WLR"
##
## $parameter
## [1] "FH(rho=0, gamma=0.5)"
##
## $estimation
## [1] -10.15942
##
## $se
## [1] 2.848798
##
## $z
## [1] -3.566213
##
wlr(trial_data_cut, weight = mb(delay = 3))
## $method
## [1] "WLR"
##
## $parameter
## [1] "MB(delay = 3, max_weight = Inf)"
##
## $estimate
## [1] -14.61053
##
## $se
## [1] 5.199295
##
## $z
## [1] -2.810097
##
rmst(trial_data_cut, tau = 20)
## $method
## [1] "RMST"
##
## $parameter
## [1] 20
##
## $estimation
## [1] 1.867087
##
## $se
## [1] 1.470548
##
## $z
## [1] 1.269654
##
milestone(trial_data_cut, ms_time = 10)
## $method
## [1] "milestone"
##
## $parameter
## [1] 10
##
## $estimation
## [1] 0.1
##
## $se
## [1] 0.07019972 0.07048404
##
## $z
## [1] 0.9964078
##
maxcombo(trial_data_cut, rho = c(0, 0), gamma = c(0, 0.5))
## $method
## [1] "Maxcombo"
##
## $parameter
## [1] "FH(0, 0) + FH(0, 0.5)"
##
## $z
## [1] -2.750898 -3.566213
##
## $p_value
## [1] 0.0002723921
##
jdblischak commented 6 months ago

In #229, I updated all the test functions to return "estimate" instead of "estimation"

LittleBeannie commented 4 months ago

Hi @jdblischak, is it okay if we close this issue as complete?

jdblischak commented 4 months ago

is it okay if we close this issue as complete?

Yes. The main remaining piece is multiple tests per cut for sim_gs_n(), so I created a new Issue for this final task (https://github.com/Merck/simtrial/issues/258)