Build Metrics for Experimentation

nhandel commented 2 years ago

Describe the Feature

MetricFlow currently builds generalized datasets for analytical applications(i.e. metrics by dimensions). Experimentation applications require a specific set of steps to denormalize and aggregate metrics in a way that enables statistical testing. The currently configurable process through MetricFlow’s APIs does not support the two-step aggregation required to create statistical tests.

We should add a plan builder that allows users to use MetricFlow to construct metrics for experimentation applications. Users of MetricFlow should not have to reinvent the wheel every time they want to run a product experiment. This is a logical extension of the metric framework in place today.

Story / Assumptions

Users of MetricFlow have an experiment assignment or feature flagging system in place that exports assignment/exposure logs to a supported DW.
Users would like to use their existing metric configurations to build metrics for experimentation applications
Users of MetricFlow can call an API to generate SQL and construct metrics for a list of metrics and a specific experiment. This API can either return data to the users' querying interface or write a dataset back to the data warehouse.

Open/New Questions

How do we capture experiment assignment configuration?
Should we allow for an entity_expr for tables that have a subject_id that could be multiple entities?
When do we load data to support CUPED or more complicated statistical tests?
What do the inputs look like from common experimentation assignment tools? Do they conform or can they be transformed into a conformed input?
What do people need to be able to do with the API?

Lots more to do to spec the plan builder and any new node types

Experiment Assignment Configuration

Expected input from experiment assignment:

CREATE TABLE events.experiment_assignments (
  subject_id BIGINT
  subject_entity STRING
, experiment_name STRING
, variant_name STRING
, first_assigned_at TIMESTAMP
);

which would be configured in MetricFlow as follows:

name: experiment_assignments
sql_table: events.experiment_assignment
experiment_assigments: True – NEW

identifiers:
  - name: subject
    expr: subject_id
    entity_expr: subject_entity – NEW
    type: foreign

measures:
  - name: assignments
    agg: sum
    expr: 1

dimensions:
  - name: experiment_name
    type: categorical
  - name: variant_name
    type: categorical
  - name: first_assigned_at
    type: time
    type_params:
      is_primary: True
      time_granularity: second

Generated

Consider a metric

name: transactions
sql_table: fact.transactions

identifiers:
  - name: user
    expr: user_id
    type: foreign

measures:
  - name: transactions
    agg: sum
    expr: 1
    create_metric: True

To support the construction of metrics for the experiment application we would need to generate two steps of aggregation. The query below could be broken into a non-aggregated step with a timestamp for more complicated statistical tests.

First, MetricFlow's bread and butter, construct a metric to the granularity of the entity of the experiment (i.e. subject_id):

-- Step 1
CREATE TABLE metricflow.user_assignments_transactions
SELECT
  ua.subject_id
  , ua.experiment_name
  , ua.variant_name
  , SUM(1) AS transactions
FROM metricflow.user_assignments ua
LEFT JOIN fact.transactions e
ON ua.subject_id = e.subject_id AND  e.created_at BETWEEN ua.first_assigned_at and {{ analysis_date }}
GROUP BY 1,2,3

Second, get the first and second moment at the granularity of the experiment and variant name

-- Step 2
CREATE TABLE metricflow.metric_variant_aggregates
SELECT
  experiment_name
  , variant_name
  , 'transactions' as metric_name
  , COUNT(subject_id) AS assignments
  , SUM(metric) / COUNT(subject_id) AS metric_mean
  , SQRT(VAR(metric)) AS metric_std
FROM metricflow.metric_user_aggregates
GROUP BY 1,2

Finally, the appropriate statistical tests for each variant name compared to the control.

-- Step 3
CREATE TABLE metricflow.metric_comparison
SELECT
  c.experiment_name
  , c.variant_name AS variant_name_control
  , t.variant_name AS variant_name_treatment
  , c.metric_name
  , t.metric_mean - c.metric_mean AS metric_diff
  , t.metric_mean / c.metric_mean - 1 AS metric_pct
  , UDFS.TTEST_IND_FROM_STATS(
      t.metric_mean, t.metric_std, t.assignments
      c.metric_mean, c.metric_std, c.assignments
    ) AS pvalue
FROM (
  SELECT *
  FROM metricflow.metric_variant_aggregates
  WHERE variante_name = {{ control_name }}
) c
JOIN (
  SELECT *
  FROM metricflow.metric_variant_aggregates
  WHERE variante_name != {{ control_name }}
) t
ON c.experiment_name = t.experiment_name
  AND c.metric_name = t.metric_name

Outputs

API

The API to generate experimental datasets could look something like this:

mf experiment_query
  --metrics x,y,z
  --dimensions a,b,c
  --experiment_name button_color
  --control_name control
  --analysis_date

Output Schemas

Load from Step 2:

CREATE TABLE metricflow.experiment_metric_values
  , experiment_name STRING
  , variant_name STRING
  , dimenison_name STRING
  , dimenison_value STRING
  , assignments BIGINT
  , metric_mean DOUBLE
  , metric_std DOUBLE
  , ts__day DATE -- analysis_date
PARTITIONED BY (experiment_name STRING, ts__day DATE)
);

Load from Step 3:

CREATE TABLE metricflow.experiment_event_source_summary
  , experiment_name STRING
  , variant_name_control STRING
  , variant_name_treatment STRING
  , metric_diff DOUBLE
  , metric_pct DOUBLE
  , pvalue DOUBLE
  , ts__day DATE -- analysis_date
PARTITIONED BY (experiment_name STRING, ts__day DATE)
);

Dimensional Analysis

Dimensional analysis could be achieved by injecting a dimension name and value into steps 1 and 2 and performing all aggregations to that dimension's name and value. The output table could look something like this

CREATE TABLE metricflow.experiment_metric_values
  , experiment_name STRING
  , variant_name STRING
  , dimenison_name STRING
  , dimenison_value STRING
  , assignments BIGINT
  , metric_mean DOUBLE
  , metric_std DOUBLE
  , ts__day DATE -- analysis_date
PARTITIONED BY (experiment_name STRING, ts__day DATE)
);

Would you like to contribute? Totally. I’m just here to add value.

Anything Else? I want to thank @danfrankj and @askeys for providing some initial thoughts that helped me sort through how we could do this.

TristanBoudreault commented 1 year ago

What would need to change in metricflow code so this join can be generated?

FROM metricflow.user_assignments ua
LEFT JOIN fact.transactions e
ON ua.subject_id = e.subject_id AND  e.created_at BETWEEN ua.first_assigned_at and {{ analysis_date }}

I’m not sure where to start looking into this.

I played with metricflow as a new user a bit and I can kind of get to that SQL, except the join order I have is wrong: transaction on the left but I want assignments, so assigned customer with no transactions can have a null or 0 in the output.

TristanBoudreault commented 1 year ago

@tlento 👋 @Jstein77 said you would be a good person to look at this.

Jstein77 commented 1 year ago

Thanks for commenting, @TristanBoudreault!

I have a couple of thoughts:

Have you checked out the logic for SCD-style joins? The code is here. This is a similar pattern to what you're after, where assigned_at is the start of the validity window and analysis_date would be the end of the window. One difference is that the end of the validity window is not dynamic, so that might be an issue.
The way the joins are currently rendered is the inverse of what we need for experimentation. We need to LEFT OUTER JOIN the metrics table onto the assignment table.

Right now:

SELECT b.condition AS user__condition, SUM(1)
FROM transactions a
LEFT OUTER JOIN user_assignments b
ON a.user_id = b.user_id AND {{ date range stuff }}

What we need:

SELECT a.condition AS user__condition, SUM(1)
FROM user_assignments a
LEFT OUTER JOIN transactions b
ON a.user_id = b.user_id AND {{ date range stuff }}

We should probably add first class support for this type of join since it's used in any kind of custom sample tracking analysis where you want to always include all members of the in sample group and nobody else i.e experimentation, cohort metrics.

dbt-labs / metricflow