bookingcom / powercalculator

Calculator to define runtime of experiments
https://bookingcom.github.io/powercalculator/
MIT License
87 stars 17 forks source link

Modifying the power utilities to support more flexible hypothesis tests #10

Closed rlopezkaufma closed 6 years ago

rlopezkaufma commented 6 years ago

This commit introduces the possibility to specify more flexible hypothesis tests for each of the power utilities. The calculations used to default to the following: H0: epsilon = 0 vs H1: epsilon != 0, with epsilon the difference in means between treatment and control. Now, via the introduction of the alternative and mu parameters, you can speficiy hypothesis tests such as:

I also introduced several simplification for consistency:

There was also a mistake in solveforeffectsize_Ttest where we would return the square of the correct value.

The commit also introduces a tentative test suite which is far from being exhaustive but which should get us started. Specifically I test each utility by using their output as parameters for simulated experiments. I then compare the observed power obtained with the theoretical power doing an equivalence test with a 0.01 margin (a two one-sided test). The margin is chosen so that the equivalence test has adequate power (0.999) and false positive rate (0.001) without being computationally prohibitive (30000 experiments, as given by -in R with the TOSTER library-: ceiling(powerTOSTone.raw(alpha = 0.001, statistical_power = 0.999, sd = sqrt(0.8*0.2), low_eqbound = -0.015, high_eqbound = 0.015)))

I test most of the utilities by simulating bernoulli distributed metrics (except the ones which are radically different for binary vs continuous metrics) instead of normally distributed ones for speed (one draw of binomial accounts for a whole experiment)

Askoth commented 6 years ago

Holding this up as we still need UI changes and force mu to 0 when we are not using "non inferiority" test