Open nhandel opened 2 years ago
What would need to change in metricflow code so this join can be generated?
FROM metricflow.user_assignments ua
LEFT JOIN fact.transactions e
ON ua.subject_id = e.subject_id AND e.created_at BETWEEN ua.first_assigned_at and {{ analysis_date }}
I’m not sure where to start looking into this.
I played with metricflow as a new user a bit and I can kind of get to that SQL, except the join order I have is wrong: transaction on the left but I want assignments, so assigned customer with no transactions can have a null or 0 in the output.
@tlento 👋 @Jstein77 said you would be a good person to look at this.
Thanks for commenting, @TristanBoudreault!
I have a couple of thoughts:
assigned_at
is the start of the validity window and analysis_date
would be the end of the window. One difference is that the end of the validity window is not dynamic, so that might be an issue.LEFT OUTER JOIN
the metrics table onto the assignment table. Right now:
SELECT b.condition AS user__condition, SUM(1)
FROM transactions a
LEFT OUTER JOIN user_assignments b
ON a.user_id = b.user_id AND {{ date range stuff }}
What we need:
SELECT a.condition AS user__condition, SUM(1)
FROM user_assignments a
LEFT OUTER JOIN transactions b
ON a.user_id = b.user_id AND {{ date range stuff }}
We should probably add first class support for this type of join since it's used in any kind of custom sample tracking analysis where you want to always include all members of the in sample group and nobody else i.e experimentation, cohort metrics.
Describe the Feature
MetricFlow currently builds generalized datasets for analytical applications(i.e. metrics by dimensions). Experimentation applications require a specific set of steps to denormalize and aggregate metrics in a way that enables statistical testing. The currently configurable process through MetricFlow’s APIs does not support the two-step aggregation required to create statistical tests.
We should add a plan builder that allows users to use MetricFlow to construct metrics for experimentation applications. Users of MetricFlow should not have to reinvent the wheel every time they want to run a product experiment. This is a logical extension of the metric framework in place today.
Story / Assumptions
Open/New Questions
Lots more to do to spec the plan builder and any new node types
Experiment Assignment Configuration
Expected input from experiment assignment:
which would be configured in MetricFlow as follows:
Generated
Consider a metric
To support the construction of metrics for the experiment application we would need to generate two steps of aggregation. The query below could be broken into a non-aggregated step with a timestamp for more complicated statistical tests.
First, MetricFlow's bread and butter, construct a metric to the granularity of the entity of the experiment (i.e. subject_id):
Second, get the first and second moment at the granularity of the experiment and variant name
Finally, the appropriate statistical tests for each variant name compared to the control.
Outputs
API
The API to generate experimental datasets could look something like this:
Output Schemas
Load from Step 2:
Load from Step 3:
Dimensional Analysis
Dimensional analysis could be achieved by injecting a dimension name and value into steps 1 and 2 and performing all aggregations to that dimension's name and value. The output table could look something like this
Would you like to contribute? Totally. I’m just here to add value.
Anything Else? I want to thank @danfrankj and @askeys for providing some initial thoughts that helped me sort through how we could do this.