A/B Test to disable suggesting all streams for new connections

evantahler commented 1 year ago

After https://github.com/airbytehq/airbyte/pull/21577, we now have the ability for sources to suggest only those important streams for users setting up new connectors. Today, if a connector has not implemented suggestedStreams, all streams are selected by default. We want to set up a test for a small group of users which swaps this behavior - what if no streams were suggested by default?

Positive Hypothesis:

First sync success % likely goes up. We are syncing less streams, and only the 'well-tested' popular streams
First sync duraiton likely goes down. If we sync less streams, it takes less time
Users think more critically about the data they want to move, leading to more investment in the resulting destination's dataset
We gain data about which streams matter, so that we can then populate suggestedStreams for more connectors

Negative Hypothesis:

Adds additional friction for the first sync for most sources (as most sources don't yet have suggested streams)
Some sources, e.g. databases, without a static list of streams, will likely never have any streams suggested by default in this new world.

I think we can do this test entirely in the front end.

The work to be done is:

Set up the Launch Darkly feature flag and test group
Fire segment events for those users within the test group to store which streams they chose for each connector
- We don't want data about users not in the test group
Report on this data in Metabase:
- Did sync success rate go up or down for the test group?
- Did sync duration go up or down for the test group?
- Did we learn what streams are popular for the connectors and can we populate more suggestedStreams

evantahler commented 1 year ago

cc @misteryeo @michel-tricot @bleonard @YowanR

misteryeo commented 1 year ago

Is there some way that we can AB test this with different users and across select connectors to observe the impact here?

I'd like to loop in @nataliekwong here to make sure she's involved as this would impact activation rates.

My hypothesis is that the give up / abandonment rate for successfully setting up a connection might increase but amongst those who do finish the setup, the % sync success increases.

bleonard commented 1 year ago

We could do this first on the frontend and use LaunchDarkly feature flag. The A/B test wouldn't likely reach significance in a time we're happy with, but it might be directionally interesting to see and we could toggle if off if there was a problem.

nataliekwong commented 1 year ago

Consolidating some thoughts between Ryan and myself from Slack thread:

Some known risks are:

You can currently set up connections without requiring to sync any streams
Users are often confused with bulk editing so it will be more difficult and time-consuming for users with more tables

That being said, given the first sync is so important to continued success, the tradeoffs here are worth exploring and I think it's worth creating an experiment with these in mind (we anticipate a larger dropoff at the connection settings).

I suggest starting with a few connectors so we can contain the experiment and put it behind a feature flag so we see the impact between the two groups. I don't think we necessarily need to wait to solve the first bullet above in order to move forward (Issue here).

My suggestion would be to choose 3 - 4 connectors so you can see how the experience differs across the types of connectors we offer, and since we want to actually be able to measure a difference between the groups ideally within a few weeks, choose connectors that have a higher number of users trying it out. We should pre-select 1 stream for them that we feel is pretty certainly going to succeed instead of giving a blank slate.

My suggestion would be:

API (Facebook Marketing, Google Ads, Hubspot) - high number of users and a smaller set of streams. We pre-select 1 stream but they can select more if they prefer.
Database (Postgres) - suggestion from Michel. No streams pre-selected as the schemas are not predefined.

@evantahler Seeing the PR - is this a type of project you/your team could take on? Or would you prefer Growth (@letiescanciano ) moves it forward?

evantahler commented 1 year ago

Thanks for all of the feedback everyone!

I think this probably still belongs in the @airbytehq/connector-operations wheelhouse, but this has grown from "a quick change" into a bit larger of a feature now :D. With that in mind, I don't know if we will have space for this in Q1B, but we'll keep it on our radar for the future. That said, if @letiescanciano wants to run with this, I'd be happy to consult!

I like the suggestion of A/B testing this, and moving the logic about which streams to suggest into the frontend for the duration of the experiment. With that in mind, I'll close https://github.com/airbytehq/airbyte/pull/22856

evantahler commented 1 year ago

@nataliekwong and @alex-gron - I rephrased this story as a front-end experiment. Can you comment on the description? Anything to add or change?

alex-gron commented 1 year ago

The description sounds great and makes sense to me!

I want to call out though that Metabase monitoring will not be possible until we have LaunchDarkly data available in the data warehouse. That work is prioritized for the end of Q1b. Do we yet know when this experiment would launch?

@bleonard Do you have any concerns with this from a Connector Sync success monitoring standpoint? Do we need to filter the test users out of your dashboard while we are testing this?

nataliekwong commented 1 year ago

Thanks for reframing! Feel free to assign @letiescanciano as she's already starting to work on this.

Fire segment events for those users within the test group to store which streams they chose for each connector We don't want data about users not in the test group

The LaunchDarkly variants get passed in Segment events, so I don't think we need to wait for it to be available in the data warehouse. I think we can send this data regardless of variant since we can always filter down by which variant they were in later on.

alex-gron commented 1 year ago

Great call on Segment events! 👍 Makes sense to me

evantahler commented 1 year ago

@nataliekwong & @letiescanciano - updating my comment above: I'd love some help from your team to move this experiment forward, especially now that this is scoped to the front-end.

nataliekwong commented 1 year ago

The Growth team's process lives in Airtable, so I'll assign @letiescanciano as the owner here and she will update the issue with the PR when it's ready!

Airtable link in case you want to read on the progress in the interim.

bleonard commented 1 year ago

The description sounds great and makes sense to me!

I want to call out though that Metabase monitoring will not be possible until we have LaunchDarkly data available in the data warehouse. That work is prioritized for the end of Q1b. Do we yet know when this experiment would launch?

@bleonard Do you have any concerns with this from a Connector Sync success monitoring standpoint? Do we need to filter the test users out of your dashboard while we are testing this?

I don't think so. If anything, they will likely have a higher success rate as they are likely to choose less streams, but I think they are just as relevant to monitor.

evantahler commented 1 year ago

@letiescanciano and @alex-gron as the experiment (https://github.com/airbytehq/airbyte-platform-internal/pull/4846) is running, if you happen to get strong signals that some some streams are rarely used, send them my way and I'll start modifying connectors

letiescanciano commented 1 year ago

@evantahler will let you know once I get the PR approved and released! :)

airbytehq / airbyte

A/B Test to disable suggesting all streams for new connections #22851