Experimentation feature requests from Support Hero

neilkakkar commented 2 years ago

I want more freedom around choosing my main insight. Filters are missing. (Saved insights might be a way around this)
I want to rollout the entire experiment to a smaller % of selected people.
The 'Baseline' in funnel experiments breakdown results makes no sense.
Make it clearer when & why there's no data
Make it clearer which events come from which libraries, and whether experiments would work for these
Why can't I change variants before starting the experiment?
Gotchas around experience continuity and experiments restricted to person properties and instant flags
I want support for frontend / mobile libraries so I can run experiments here.
If I make a mistake choosing my target metric, I want to be able to change this during the experiment

neilkakkar commented 1 year ago

It's hard for people who didn't create the experiment to understand what's going on: See https://posthog.slack.com/archives/C011L071P8U/p1659538701607079

neilkakkar commented 1 year ago

In relation to (1) & freedom to create any insight: If I can't tweak the conversion rate of a funnel metric, I might be biasing towards early-comers to an experiment, assuming seasonality in people coming to my product (like, a sale, etc. etc.). In other cases, we might just be interested in a 1 day conversion window, rather than a 14 day one.

neilkakkar commented 1 year ago

When creating an experiment: Warn that atleast one event needs to be instrumented with feature flags (automatically for posthog-js, manual for the rest)
When selecting person properties in flags, warn that these might not be immediately available & link to how to make feature flags instant.

neilkakkar commented 1 year ago

Based on the above, things to do this sprint:

Clean up

[x] Add conversion window, breakdown attribution type, exclusion steps, filtering out internal & test users to funnel metric: https://github.com/PostHog/posthog/issues/9475
[x] Allow editing main & secondary metrics while the experiment is running
[x] Allow 'restarting' an experiment
[ ] Add variant-overrides as testing mechanism to the Implementation Snippet
[x] TaxonomicFilter: Fix bug when selecting a filter for secondary metric applies to main metric as well. https://github.com/PostHog/posthog/issues/11553
[x] Remove 'refresh', 'computed a while ago' & 'Baseline' from experiment results screen - @neilkakkar : https://github.com/PostHog/posthog/pull/12091
[x] Instead of saying 'there's no data', parse the validation error raised by the API and tell users what to do in this case. - @neilkakkar
[x] Whenever someone creates the target metric, check the event source, and if it's not a frontend library, warn that they'd have to manually send feature flag details. - @neilkakkar
[x] https://github.com/PostHog/posthog/issues/11641 - @neilkakkar
[x] When selecting person properties in flags, warn that these might not be immediately available & link to how to make feature flags instant. - @neilkakkar
[x] Investigate experiments not showing results immediately after first person shows up for control & test - @neilkakkar -> found the issue: https://github.com/PostHog/posthog/pull/11935#issuecomment-1273301444
[x] Don't allow selecting behavioral cohorts https://github.com/PostHog/posthog/issues/11070 - @neilkakkar
[x] In draft mode, link to the feature flag for advanced customisation options (like experience continuity & rolling out to a subset of population). In create mode, explain that this is possible in draft mode. - @neilkakkar
[x] https://github.com/PostHog/posthog/issues/10936 - @neilkakkar
[x] Move experiments out of beta, since all libraries now support it. - @neilkakkar
[x] https://github.com/PostHog/posthog/issues/10227 - default sort experiment list by created at

For future:

[ ] Scheduling experiments: Let users choose a date & time when the experiment should begin.
[ ] An 'Explore more' button below the target metric, that automatically creates an insight in edit mode with the correct feature flag breakdown and time range selected.

Growth

[ ] ???

liyiy commented 1 year ago

Connect experiment to feature flag (ex: one click experiment setup with the feature flag)
Improve experiments UX - reduce complexity in creating, setting up, and viewing an experiment

neilkakkar commented 1 year ago

What does the first one mean?

neilkakkar commented 1 year ago

What is the best combination of variants across multiple experiments happening at the same time?

raquelmsmith commented 1 year ago

A couple additional things from me:

I'd like to modify the start date for my experiment.
Many experiments have this note: Your results are not statistically significant. This is because the win probability of all test variants combined is less than 90%. We don't recommend ending this experiment yet. See our [experimentation guide ](https://posthog.com/docs/user-guides/experimentation#funnel-experiment-calculations)for more information. But when I look at the "probability of it being the best" it all adds up to 100%.
- I get the statistical significance part, but the UI makes it very confusing.

neilkakkar commented 1 year ago

re (2): "Test" variants, excluding control, as otherwise all probabilities would always add up to 100%.

Agreed though that the copy makes less sense with only 2 variants, will fix, thank you! 🙌

raquelmsmith commented 1 year ago

Ohhhhhhhhh. Aren't we simply interested in seeing that any single variant is over 90% likely to be the best? Because results can still be statistically significant if the control is 90% likely to be the best.

neilkakkar commented 1 year ago

Yep, correct, the copy changes for when the control is > 90%.

neilkakkar commented 1 year ago

that any single variant is over 90%

The problem occurs with multiple test variants where this doesn't really hold. Example: If variant-a is 2x as good as control, and variant-b is 2.1x as good as control, the probabilities will be something like: 49%, 48%, and 3%. In this case, we have significance, even if the top probability is 49%.

raquelmsmith commented 1 year ago

I think the UI would make a lot more sense if the probabilities were all contrasted against the control individually, and then additionally against each other. So it could be read like:

Variant A has 92% probability of being better than Control Variant B has 93% probability of being better than Control Variant B has 52% probability of being better than Variant A.

That way there is no question about which variant is actually performing best, and we're not using the term "statistically significant" in weird ways (since you'd expect anything that is statistically significant to have a value of over 90-99% unless you failed your stats class in college).

neilkakkar commented 1 year ago

Interesting idea, makes sense! 🙌 I'll experiment with it

(since you'd expect anything that is statistically significant to have a value of over 90-99% unless you failed your stats class in college)

This is actually not true, since we're dealing with probabilities, not the p-values. 😅

Have a read here: https://posthog.com/manual/experimentation#bayesian-ab-testing

raquelmsmith commented 1 year ago

Ohh okay. I misunderstood then. Also it's also been a long while since I took any stats class so please take any of my suggestions or misunderstandings lightly 😄

PostHog / posthog

Experimentation feature requests from Support Hero #10426

Clean up

Growth