github / scientist

:microscope: A Ruby library for carefully refactoring critical paths.
MIT License
7.44k stars 439 forks source link

Add an optional `cohort` block to science experiments #170

Open brasic opened 2 years ago

brasic commented 2 years ago

(This is the first of several improvements to scientist based on extractions from the GitHub monolith)

This adds the concept of a "cohort" to an experiment result, to enable and encourage bucketed result publishing.

Many experiments operate on data with a very long tail, and the fat part of the distribution can completely wash out notable results in sub-groups with lower frequency. For example, experiment results derived from the data of very large customers often look quite different than the much more common results from the small data, yet the latter might be so much more common as to make the former statistically invisible. Even the use of percentile metrics can't overcome these effects since often the relevant percentiles are very high (above 99-percentile).

To address this issue, this PR adds an optional block to Science::Experiment which should return a "cohort" when called. The cohort is passed the result of the experiment so it can determine the cohort from the context data, whether the result is a mismatch or any of the observation data.

The determined cohort value is available as Scientist::Result#cohort and is intended to be used by the user-defined publication mechanism.

Here's an example of how it might be used to segment the results of an experiment between "large" and "small" users:


science "widget-count" do |experiment|
  experiment.use { user.count_widgets }
  experiment.try { user.fast_count_widgets }
  experiment.cohort { |res| res.control.value > 100 ? "large" : "small" }
end
zerowidth commented 2 years ago

cohort might be too specific. Since it's adding metadata to an observation, I wonder if metadata and a block that returns a value might allow for more flexible and generic use, e.g.

science "widget-count" do |experiment|
  experiment.use { user.count_widgets }
  experiment.try { user.fast_count_widgets }
  experiment.metadata { |res| { cohort: res.control.value > 100 ? "large" : "small" } }
end

I'm curious what other improvements you had in mind.

Watemlifts commented 2 years ago

Codes changed without conflict