Merck / simtrial

Clinical trial simulation for time-to-event endpoints
https://merck.github.io/simtrial/
GNU General Public License v3.0
16 stars 8 forks source link

Create formula interface for RMST test #216

Closed jdblischak closed 6 months ago

jdblischak commented 6 months ago

Closes #189

xref: https://github.com/Merck/simtrial/pull/188#issuecomment-1924398310

I added a simple formula interface, rmst.formula, and converted rmst() to an S3 generic function. I say simple because it only extracts the variable names from the formula, and then uses these to call rmst.default(). Converting to the S3 generic required changing the first argument name to something that could stand for a data frame or a formula. I chose x since this is a common S3 pattern (eg see ?aggregate). The main downside I see of making rmst() an S3 generic is that its first argument is no longer data like all the other test functions like wlr() and maxcombo(). UPDATE: After feedback, I instead added an additional argument formula = NULL to the existing rmst() function

Please give it a try and let me know how it can be improved.

nanxstats commented 6 months ago

This is great start. Just a few design and style comments that can be easily addressed once the decision is made:

  1. I think data should be the first argument, Hadley also has a similar opinion.
  2. Would it make sense to name the formula argument as formula instead of x, to be obvious, and to be consistent with other model fitting packages?
  3. Make the argument only accept formula, instead of creating a dependency between arguments.
  4. If we enforce a single, strict formula syntax that is consistent with the conventions, I think it would encourage people to write correct code. For example, Surv(time, event) ~ treatment. Although we can still interpret it not literally, by only using the variable names, like your current implementation. tidymodels/censored also has thoughts on related issues.
jdblischak commented 6 months ago
  1. I think data should be the first argument, Hadley also has a similar opinion.
  2. Would it make sense to name the formula argument as formula instead of x, to be obvious, and to be consistent with other model fitting packages?
  3. Make the argument only accept formula, instead of creating a dependency between arguments.

These are all objections to making rmst() an S3 generic. The alternative would be to have two separate functions. Your bullet points describe essentially what I had in the first commit of this PR, 0bbdaa48d18d94dc464b99ac346d033a451459c3, prior to converting to an S3 generic

  1. If we enforce a single, strict formula syntax that is consistent with the conventions, I think it would encourage people to write correct code. For example, Surv(time, event) ~ treatment. Although we can still interpret it not literally, by only using the variable names, like your current implementation. tidymodels/censored also has thoughts on related issues.

I worry about the implementation. Dealing with formulas/language objects is difficult to do robustly. I think the end users need to know that Surv() is not executed. For example, what if a user used the named arguments and put them out of order, eg Surv(event = x, time2 = y, time = z). This would be expected to work, but would fail in our setup.