Automated Tests - Githubissues

RhysPeploe commented 6 months ago

https://github.com/openjournals/joss-reviews/issues/6322

From the JOSS checklist - Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?

Can some tests be created for the functions? Some simple ones that one the examples, ensuring no errors occur will be sufficient for me to check this list

Tpatni719 commented 6 months ago

Thank you for the comment! So, to check the veracity of these functions, we check whether we reached the desired power or Familywise error rates in the simulations. To give an example:

op_fwer_cont(alpha=0.05,beta=0.1,K=3,frac=c(0.5,1),delta0=0.178, delta1=0.545,nsim=10000,seed=10)
$FWER
[1] 0.05 (Reached the desired FWER)
$`Stagewise FWER`
look1 look2
0.0050 0.0466 (should add up to around 0.05)

In the above results, we have reached the desired type I error of 0.05 which makes sure that the design and op functions are working correctly. Similarly, we can use the same logic in power characteristic. Does that answer your question?

RhysPeploe commented 6 months ago

Are these simulations ran manually or automatically? It would be good to have these checks ran when you go through the CMD build stage in the package check

Tpatni719 commented 6 months ago

These simulations are run using the operating characteristics functions provided in the package. So, when we are designing a trial, we want to make sure that the trial parameters are such which gives the desired type I and type II error rate and in order to do that people routinely run simulations to check the operating characteristics of the trial. But sometimes, we can't reach the desired power and FWER (for example if your power is 0.9 then you might only reach 0.89 or 0.88) because of random variation or the number of simulations that you have executed. Please see the below snapshot in which our beta=0.1(Power=1-beta=0.9) but we only reached power of 0.893.

Therefore, it is kind of difficult to create these checks but as long as we are close to desired power or FWER then we know that the trial design is working correctly and sometimes, investigators want to tweak these parameters to such an extent that it will be difficult to reach the desired power because of very low effect size, type I and type II error rate just to get a feel of the trial design.

Our package provides functions for following two stages:

To design the trial using the design functions.
To check the veracity of design using the operating characteristics functions.

RhysPeploe commented 6 months ago

I see what you mean! I'll consult with Toni whether they are happy with this - thanks!

njtierney commented 6 months ago

Regarding tests, I would suggest reading this section of the R packages book on testing.

There are lots of ways to think about tests, but in one sense, these tests are useful in understanding what the outputs of the functions are. I know that this doesn't necessary tell you if the test did the right thing, but it is really useful when you go back to your package and make changes - your tests will identify if the output changes and you won't need to run your own manual tests in the console to understand how it might have changed.

Here's an example of how you could create tests for design_cont():

usethis::use_testthat()
# create the test file for design_cont
use_test("design_cont")

continuous_design_output <- design_cont(
  delta0 = 0.178,
  delta1 = 0.545,
  alpha = 0.05, 
  beta = 0.1, 
  K = 4, 
  frac = c(1 / 2, 1)
)

# check names remains the same
expect_snapshot(
  names(continuous_design_output)
)

# check there are 4 values returned
expect_length(
  as.numeric(continuous_design_output$`Sample size`),
  4
)

# check that the values are greater than or equal to 420
expect_gte(
  as.numeric(continuous_design_output$`Maximum total sample size for the trial`),
  420
)

# expect boundary values are double
expect_type(
  continuous_design_output$`Boundary values`,
  type = "double"
  )

njtierney commented 6 months ago

Tests could also cover what happens when the wrong input is specified, e.g.,

library(gsMAMS)

design_cont(
  delta0 = "0.178",
  delta1 = 0.545,
  alpha = 0.05, 
  beta = 0.1, 
  K = 4, 
  frac = c(1 / 2, 1)
)
#> Error in sqrt(r * n/(1 + r)) * delta: non-numeric argument to binary operator

design_cont(
  delta0 = c(0.178,0.178),
  delta1 = 0.545,
  alpha = 0.05, 
  beta = 0.1, 
  K = 4, 
  frac = c(1 / 2, 1)
)
#> Error in A %*% mu: non-conformable arguments

^{Created on 2024-03-26 with reprex v2.1.0}

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.3 (2024-02-29) #> os macOS Sonoma 14.3.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Brisbane #> date 2024-03-26 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1) #> digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.1) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1) #> gsMAMS * 0.7.1 2024-03-25 [1] Github (Tpatni719/gsMAMS@04f6777) #> htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lattice 0.22-5 2023-10-24 [1] CRAN (R 4.3.1) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> Matrix 1.6-5 2024-01-11 [1] CRAN (R 4.3.1) #> mvtnorm 1.2-4 2023-11-27 [1] CRAN (R 4.3.1) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [2] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [2] CRAN (R 4.3.0) #> R.oo 1.26.0 2024-01-24 [2] CRAN (R 4.3.1) #> R.utils 2.12.3 2023-11-18 [2] CRAN (R 4.3.1) #> reprex 2.1.0 2024-01-11 [2] CRAN (R 4.3.1) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.1) #> rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) #> rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.3.0) #> styler 1.10.2 2023-08-29 [2] CRAN (R 4.3.0) #> survival 3.5-8 2024-02-14 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.1) #> xfun 0.42 2024-02-08 [1] CRAN (R 4.3.1) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> #> [1] /Users/nick/Library/R/arm64/4.3/library #> [2] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

This suggests that the functions could provide better error handling to identify to the user how many values and what type should be specified.

Tpatni123 commented 5 months ago

@njtierney I have added the tests for all the functions and they check the fundamental things regarding the design and operating characteristics. But to emphasize that when we design a trial, we "always" run the operating characteristics function to check the FWER and power to make sure that they are reaching the desired level which ensures that our design parameters are correct. So, even though it is kind of a manual testing, that is the inherent nature of trial designing and planning and is routinely performed.

njtierney commented 5 months ago

@njtierney I have added the tests for all the functions and they check the fundamental things regarding the design and operating characteristics.

Fantastic! That's really awesome.

But to emphasize that when we design a trial, we "always" run the operating characteristics function to check the FWER and power to make sure that they are reaching the desired level which ensures that our design parameters are correct. So, even though it is kind of a manual testing, that is the inherent nature of trial designing and planning and is routinely performed.

Testing statistical tests/models is always a bit fraught, but the tests that you have written will help you capture strange errors or other issues. This means that you can be more sure when you make changes that fundamental breaking errors (like changing variable names, for example), will be caught early when you run package checks on them.

Well done!

Tpatni123 commented 5 months ago

Are these simulations ran manually or automatically? It would be good to have these checks ran when you go through the CMD build stage in the package check

@RhysPeploe I have added the unit tests with the help of @njtierney. So, please let me know if I can close this issue.

RhysPeploe commented 5 months ago

@Tpatni123 Yes - happy for this to be closed! Thanks

Tpatni719 / gsMAMS

Automated Tests #4