avast / ep-stats

Statistics for Experimentation Platform
MIT License
16 stars 11 forks source link

Required sample size #41

Closed jancervenka closed 1 year ago

jancervenka commented 1 year ago

Metrics now have an optional minimal_effect argument that is used to compute the sample size required to reach 80% power.

REST API example:

Screenshot 2022-09-20 at 9 30 22

Python API example:

from epstats.toolkit import Experiment, metric

experiment = Experiment(
    id="test-conversion-with-minimum-effect",
    control_variant="a",
    metrics=[
        Metric(
            id=1,
            name="Clicks per User",
            nominator="count(test_unit_type.unit.click)",
            denominator="count(test_unit_type.global.exposure)",
            minimum_effect=0.1,
        ),
        Metric(
            id=2,
            name="Purchases per user",
            nominator="count(test_unit_type.unit.purchase)",
            denominator="count(test_unit_type.global.exposure)",
            minimum_effect=0.1,
        ),
    ],
    checks=[],
    unit_type=test_unit_type,
)

Btw in the last PR https://github.com/avast/ep-stats/pull/40, we talked about Bonferroni vs Holm-Bonferroni correction. Holm-Bonferroni can be applied here because we already have the $p$-values. However, it would result in each variant having very different required_sample_size because the correction depends on the $p$-value. I think it's better to just stick with the classic Bonferroni and use the most conservative $\alpha$ for all variants so that the required sizes are equal.

Consider an example with 4 variants and $p$-values $p_B = 0.001, p_C = 0.005, p_D = 0.01$.

variant $p$ Holm-Bonferroni Bonferroni Required size (Holm-Bonferroni) Required size (Bonferroni)
B 0.001 $\frac{\alpha}{3}=0.05/3$ $\frac{\alpha}{3}=0.05/3$ 4711 4711
C 0.005 $\frac{\alpha}{2}=0.05/2$ $\frac{\alpha}{3}=0.05/3$ 4277 4711
D 0.010 $\frac{\alpha}{1}=0.05$ $\frac{\alpha}{3}=0.05/3$ 3532 4711
marekbenes commented 1 year ago

Agreed to use Bonferroni. Just please state in the documentation that we are more conservative for power calculation.