Open neilkakkar opened 2 years ago
It's hard for people who didn't create the experiment to understand what's going on: See https://posthog.slack.com/archives/C011L071P8U/p1659538701607079
When creating an experiment: Warn that atleast one event needs to be instrumented with feature flags (automatically for posthog-js, manual for the rest)
When selecting person properties in flags, warn that these might not be immediately available & link to how to make feature flags instant.
Based on the above, things to do this sprint:
For future:
What does the first one mean?
A couple additional things from me:
Your results are not statistically significant. This is because the win probability of all test variants combined is less than 90%. We don't recommend ending this experiment yet. See our [experimentation guide ](https://posthog.com/docs/user-guides/experimentation#funnel-experiment-calculations)for more information.
But when I look at the "probability of it being the best" it all adds up to 100%.
re (2): "Test" variants, excluding control, as otherwise all probabilities would always add up to 100%.
Agreed though that the copy makes less sense with only 2 variants, will fix, thank you! 🙌
Ohhhhhhhhh. Aren't we simply interested in seeing that any single variant is over 90% likely to be the best? Because results can still be statistically significant if the control is 90% likely to be the best.
Yep, correct, the copy changes for when the control is > 90%.
that any single variant is over 90%
The problem occurs with multiple test variants where this doesn't really hold. Example: If variant-a is 2x as good as control, and variant-b is 2.1x as good as control, the probabilities will be something like: 49%, 48%, and 3%. In this case, we have significance, even if the top probability is 49%.
I think the UI would make a lot more sense if the probabilities were all contrasted against the control individually, and then additionally against each other. So it could be read like:
Variant A has 92% probability of being better than Control Variant B has 93% probability of being better than Control Variant B has 52% probability of being better than Variant A.
That way there is no question about which variant is actually performing best, and we're not using the term "statistically significant" in weird ways (since you'd expect anything that is statistically significant to have a value of over 90-99% unless you failed your stats class in college).
Interesting idea, makes sense! 🙌 I'll experiment with it
(since you'd expect anything that is statistically significant to have a value of over 90-99% unless you failed your stats class in college)
This is actually not true, since we're dealing with probabilities, not the p-values. 😅
Have a read here: https://posthog.com/manual/experimentation#bayesian-ab-testing
Ohh okay. I misunderstood then. Also it's also been a long while since I took any stats class so please take any of my suggestions or misunderstandings lightly 😄