airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
16.16k stars 4.13k forks source link

A/B Test to disable suggesting all streams for new connections #22851

Closed evantahler closed 1 year ago

evantahler commented 1 year ago

After https://github.com/airbytehq/airbyte/pull/21577, we now have the ability for sources to suggest only those important streams for users setting up new connectors. Today, if a connector has not implemented suggestedStreams, all streams are selected by default. We want to set up a test for a small group of users which swaps this behavior - what if no streams were suggested by default?

Positive Hypothesis:

Negative Hypothesis:

I think we can do this test entirely in the front end.

The work to be done is:

  1. Set up the Launch Darkly feature flag and test group
  2. Fire segment events for those users within the test group to store which streams they chose for each connector
    • We don't want data about users not in the test group
  3. Report on this data in Metabase:
    • Did sync success rate go up or down for the test group?
    • Did sync duration go up or down for the test group?
    • Did we learn what streams are popular for the connectors and can we populate more suggestedStreams
evantahler commented 1 year ago

cc @misteryeo @michel-tricot @bleonard @YowanR

misteryeo commented 1 year ago

Is there some way that we can AB test this with different users and across select connectors to observe the impact here?

I'd like to loop in @nataliekwong here to make sure she's involved as this would impact activation rates.

My hypothesis is that the give up / abandonment rate for successfully setting up a connection might increase but amongst those who do finish the setup, the % sync success increases.

bleonard commented 1 year ago

We could do this first on the frontend and use LaunchDarkly feature flag. The A/B test wouldn't likely reach significance in a time we're happy with, but it might be directionally interesting to see and we could toggle if off if there was a problem.

nataliekwong commented 1 year ago

Consolidating some thoughts between Ryan and myself from Slack thread:

Some known risks are:

That being said, given the first sync is so important to continued success, the tradeoffs here are worth exploring and I think it's worth creating an experiment with these in mind (we anticipate a larger dropoff at the connection settings).

I suggest starting with a few connectors so we can contain the experiment and put it behind a feature flag so we see the impact between the two groups. I don't think we necessarily need to wait to solve the first bullet above in order to move forward (Issue here).

My suggestion would be to choose 3 - 4 connectors so you can see how the experience differs across the types of connectors we offer, and since we want to actually be able to measure a difference between the groups ideally within a few weeks, choose connectors that have a higher number of users trying it out. We should pre-select 1 stream for them that we feel is pretty certainly going to succeed instead of giving a blank slate.

My suggestion would be:

@evantahler Seeing the PR - is this a type of project you/your team could take on? Or would you prefer Growth (@letiescanciano ) moves it forward?

evantahler commented 1 year ago

Thanks for all of the feedback everyone!

I think this probably still belongs in the @airbytehq/connector-operations wheelhouse, but this has grown from "a quick change" into a bit larger of a feature now :D. With that in mind, I don't know if we will have space for this in Q1B, but we'll keep it on our radar for the future. That said, if @letiescanciano wants to run with this, I'd be happy to consult!

I like the suggestion of A/B testing this, and moving the logic about which streams to suggest into the frontend for the duration of the experiment. With that in mind, I'll close https://github.com/airbytehq/airbyte/pull/22856

evantahler commented 1 year ago

@nataliekwong and @alex-gron - I rephrased this story as a front-end experiment. Can you comment on the description? Anything to add or change?

alex-gron commented 1 year ago

The description sounds great and makes sense to me!

I want to call out though that Metabase monitoring will not be possible until we have LaunchDarkly data available in the data warehouse. That work is prioritized for the end of Q1b. Do we yet know when this experiment would launch?

@bleonard Do you have any concerns with this from a Connector Sync success monitoring standpoint? Do we need to filter the test users out of your dashboard while we are testing this?

nataliekwong commented 1 year ago

Thanks for reframing! Feel free to assign @letiescanciano as she's already starting to work on this.

Fire segment events for those users within the test group to store which streams they chose for each connector We don't want data about users not in the test group

The LaunchDarkly variants get passed in Segment events, so I don't think we need to wait for it to be available in the data warehouse. I think we can send this data regardless of variant since we can always filter down by which variant they were in later on.

alex-gron commented 1 year ago

Great call on Segment events! 👍 Makes sense to me

evantahler commented 1 year ago

@nataliekwong & @letiescanciano - updating my comment above: I'd love some help from your team to move this experiment forward, especially now that this is scoped to the front-end.

nataliekwong commented 1 year ago

The Growth team's process lives in Airtable, so I'll assign @letiescanciano as the owner here and she will update the issue with the PR when it's ready!

Airtable link in case you want to read on the progress in the interim.

bleonard commented 1 year ago

The description sounds great and makes sense to me!

I want to call out though that Metabase monitoring will not be possible until we have LaunchDarkly data available in the data warehouse. That work is prioritized for the end of Q1b. Do we yet know when this experiment would launch?

@bleonard Do you have any concerns with this from a Connector Sync success monitoring standpoint? Do we need to filter the test users out of your dashboard while we are testing this?

I don't think so. If anything, they will likely have a higher success rate as they are likely to choose less streams, but I think they are just as relevant to monitor.

evantahler commented 1 year ago

@letiescanciano and @alex-gron as the experiment (https://github.com/airbytehq/airbyte-platform-internal/pull/4846) is running, if you happen to get strong signals that some some streams are rarely used, send them my way and I'll start modifying connectors

letiescanciano commented 1 year ago

@evantahler will let you know once I get the PR approved and released! :)