david26694 / cluster-experiments

Simulation-based power analysis library
https://david26694.github.io/cluster-experiments/
MIT License
26 stars 3 forks source link

Experiment analysis code #184

Closed david26694 closed 1 month ago

david26694 commented 1 month ago

Analysis implementation by @ludovico-lanni :


my_experiment = Experiment(metadata={})

metric_1 = Metric(alias, components=str of tuple(str,str) for ratio metrics)
metric_2 = Metric(...)

all_variants = [Variant(name, is_control), Variant(...), Variant(...)]

test_1 = HypothesisTest(
    metric = metric_1,
    analysis = analysis_1,
    **analysis_config) # parameters for the specific experiment analysis (no dataframes here)

test_2 = HypothesisTest(
    metric = metric_2,
    analysis = analysis_1,
    **analysis_config)

test_3 = HypothesisTest(
    metric = metric_2,
    analysis = analysis_1,
    **analysis_config)

# plan single or multiple hypothesis test sharing the same data
analysis_plan_1 = AnalysisPlan(
    tests=[test_1, test_2], 
    data=df, 
    variants=all_variants,
    pre_exp_df=pre_exp_df) #optional

analysis_plan_2 = AnalysisPlan(
    tests=[test_3], 
    data=df_filtered, 
    variants=all_variants,
    pre_exp_df=pre_exp_df_filtered) #optional

exp_results = my_experiment.analyze(
    analysis_plans=[analysis_plan_1, analysis_plan_2], 
    alpha=0.05)

exp_results.to_dataframe()

First correction:


analysis_plan_1 = AnalysisPlan(
    tests=[test_1, test_2], 
    variants=all_variants
)

analysis_plan_2 = AnalysisPlan(
    tests=[test_3], 
    variants=all_variants,
)

Results_1 = analysis_plan_1.analyze(df, pre_experiment_df)
Results_1_filter = analysis_plan_1.analyze(df_filtered, pre_experiment_df, alpha=0.1)
Results_2 = analysis_plan_2.analyze(df, pre_experiment_df)

(Results_1 + Results_1_filter + Results_2).to_dataframe()

Code design proposed by Ludo:


class Experiment:
    metadata: dict

class Metric:
    alias: str
    components: Tuple[str, str]

class Variant:
    name: str
    is_control: bool

class HypothesisTest:
    metric: Metric
    analysis: Analysis
    analysis_config: dict

class AnalysisPlan:
    tests: List[HypothesisTest]
    variants: List[Variant]

    def analyze(self, data, pre_exp_df, alpha=0.05) -> AnalysisResults:
        # do the analysis

class AnalysisResults:
    def to_dataframe(self) -> pd.DataFrame:
        # return the results as a dataframe

    def __add__(self, other: AnalysisResults) -> AnalysisResults:
        # combine the results

I think overall we can drop the experiment class, and just have the metadata as a dict in the AnalysisPlan. We could even rename AnalysisPlan to Experiment, I like it more.

class AnalysisPlan:
    metadata: dict
    tests: List[HypothesisTest]
    variants: List[Variant]

    def analyze(self, data, pre_exp_df, alpha=0.05) -> AnalysisResults:
        # do the analysis

The issue of this is that it allows metadata to be different accross different analysis plans. One workaround is

class AnalysisPlan:
    tests: List[HypothesisTest]
    variants: List[Variant]

    def analyze(self, data, pre_exp_df, alpha=0.05) -> AnalysisResults:
        # do the analysis

class AnalysisResults:

    def __str__(self, metadata) -> str:
        # return the results as a string

    def to_dataframe(self, metadata) -> pd.DataFrame:
        # return the results as a dataframe

    def __add__(self, other: AnalysisResults) -> AnalysisResults:
        # combine the results

If we want to create stuff from config:


analysis_plan_config = {
    "tests": [
        {
            "metric": {
                "alias": "metric_1",
                "components": ("component_1", "component_2")
            },
            "analysis": {
                "name": "analysis_1",
                "config": {
                    "param_1": 1,
                    "param_2": 2
                }
            }
        },
        {
            "metric": {
                "alias": "metric_2",
                "components": ("component_3", "component_4")
            },
            "analysis": {
                "name": "analysis_1",
                "config": {
                    "param_1": 1,
                    "param_2": 2
                }
            }
        }
    ],
    "variants": [
        {
            "name": "variant_1",
            "is_control": True
        },
        {
            "name": "variant_2",
            "is_control": False
        }
    ]
}

Alternatively, we could add variants in each test:


analysis_plan_config = {
    "tests": [
        {
            "metric": {
                "alias": "metric_1",
                "components": ("component_1", "component_2")
            },
            "analysis": {
                "name": "analysis_1",
                "config": {
                    "param_1": 1,
                    "param_2": 2
                }
            },
            "variants": [
                {
                    "name": "variant_1",
                    "is_control": True
                },
                {
                    "name": "variant_2",
                    "is_control": False
                }
            ]
        },
        {
            "metric": {
                "alias": "metric_2",
                "components": ("component_3", "component_4")
            },
            "analysis": {
                "name": "analysis_1",
                "config": {
                    "param_1": 1,
                    "param_2": 2
                }
            },
            "variants": [
                {
                    "name": "variant_1",
                    "is_control": True
                },
                {
                    "name": "variant_2",
                    "is_control": False
                }
            ]
        }
    ],
}

This way the dict looks bigger, we need to repeat code. But there are less classes


class Metric:
    alias: str
    components: Tuple[str, str]

class Variant:
    name: str
    is_control: bool

class HypothesisTest:
    metadata: dict
    metric: Metric
    analysis: Analysis
    analysis_config: dict
    variants: List[Variant]

    def analyze(self, data, pre_exp_df, alpha=0.05) -> AnalysisResults:
        # do the analysis

class AnalysisResults:
    def to_dataframe(self) -> pd.DataFrame:
        # return the results as a dataframe

    def __add__(self, other: AnalysisResults) -> AnalysisResults:
        # combine the results

## usage
metric_1 = Metric(alias, components=str of tuple(str,str) for ratio metrics)
metric_2 = Metric(...)

all_variants = [Variant(name, is_control), Variant(...), Variant(...)]

test_1 = HypothesisTest(
    metadata={},
    metric = metric_1,
    analysis = analysis_1,
    analysis_config={},
    variants=all_variants
)

test_2 = HypothesisTest(
    metadata={},
    metric = metric_2,
    analysis = analysis_1,
    analysis_config={},
    variants=all_variants
)

results_1 = test_1.analyze(data, pre_exp_df)
results_2 = test_2.analyze(data, pre_exp_df)

(results_1 + results_2).to_dataframe()
ludovico-lanni commented 1 month ago

some notes:

david26694 commented 1 month ago

class Metric:
    alias: str
    components: Tuple[str, str]

class Variant:
    name: str
    is_control: bool

class HypothesisTest:
    metric: Metric
    analysis: Analysis
    analysis_config: dict

class AnalysisPlan:
    tests: List[HypothesisTest]
    variants: List[Variant]

    def analyze(self, data, pre_exp_df, alpha=0.05) -> AnalysisResults:
        # do the analysis

class AnalysisResults:
    def __str__(self, metadata: dict) -> str:
        # some representation method could have metadata as input
    def to_dataframe(self) -> pd.DataFrame:
        # return the results as a dataframe

    def __add__(self, other: AnalysisResults) -> AnalysisResults:
        # combine the results
david26694 commented 1 month ago

analysis_plan_config = {
    "tests": [
        {
            "metric": {
                "alias": "metric_1",
                "components": ("component_1", "component_2")
            },
            "analysis": {
                "name": "analysis_1",
                "config": {
                    "param_1": 1,
                    "param_2": 2
                }
            }
        },
        {
            "metric": {
                "alias": "metric_2",
                "components": ("component_3", "component_4")
            },
            "analysis": {
                "name": "analysis_1",
                "config": {
                    "param_1": 1,
                    "param_2": 2
                }
            }
        }
    ],
    "variants": [
        {
            "name": "variant_1",
            "is_control": True
        },
        {
            "name": "variant_2",
            "is_control": False
        }
    ]
}
david26694 commented 1 month ago

Slicing implementation:


class Metric:
    alias: str
    components: Tuple[str, str]

class Variant:
    name: str
    is_control: bool

class Dimension:
    slice_col: str
    slice_values: List[str]

class HypothesisTest:
    metric: Metric
    analysis: Analysis
    analysis_config: dict
    dimensions: List[Dimension]

    def slice(self):
        for dimension in self.dimensions:
            for value in dimension.slice_values:
                # slice the data, run the analysis

class AnalysisPlan:
    tests: List[HypothesisTest]
    variants: List[Variant]

    def analyze(self, data, pre_exp_df, alpha=0.05) -> AnalysisResults:
        # do the analysis

class AnalysisResults:
    def __str__(self, metadata: dict) -> str:
        # some representation method could have metadata as input
    def to_dataframe(self) -> pd.DataFrame:
        # return the results as a dataframe

    def __add__(self, other: AnalysisResults) -> AnalysisResults:
        # combine the results