facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.16k stars 344 forks source link

add feature engineering implementations #1065

Closed alxlyj closed 1 month ago

alxlyj commented 2 months ago

Project Robyn

In this PR - we are landing implementation code for the feature preprocessing/engineering component. This is similar to robyn_inputs and robyn_engineering.

Test Plan

graph TD
    %% Color definitions
    classDef r_node fill:#FF9999,stroke:#FF0000,stroke-width:2px;
    classDef py_node fill:#99CCFF,stroke:#0000FF,stroke-width:2px;
    classDef mapped fill:#98FB98,stroke:#006400,stroke-width:2px;
    classDef partially_mapped fill:#FFFACD,stroke:#DAA520,stroke-width:2px;
    classDef missing fill:#FFB6C1,stroke:#FF69B4,stroke-width:2px;

    subgraph R ["R Implementation (inputs.R)"]
        R_robyn_inputs["robyn_inputs()"]:::r_node
        R_robyn_engineering["robyn_engineering()"]:::r_node
        R_prophet_decomp["prophet_decomp()"]:::r_node
        R_fit_spend_exposure["fit_spend_exposure()"]:::r_node
        R_set_holidays["set_holidays()"]:::r_node
    end

    subgraph Python ["Python Implementation (feature_engineering.py)"]
        P_FeatureEngineering["FeatureEngineering class"]:::py_node
        P_perform_feature_engineering["perform_feature_engineering()"]:::py_node
        P_prepare_data["_prepare_data()"]:::py_node
        P_create_rolling_window_data["_create_rolling_window_data()"]:::py_node
        P_calculate_media_cost_factor["_calculate_media_cost_factor()"]:::py_node
        P_run_models["_run_models()"]:::py_node
        P_fit_spend_exposure["_fit_spend_exposure()"]:::py_node
        P_prophet_decomposition["_prophet_decomposition()"]:::py_node
        P_prepare_holidays_for_prophet["_prepare_holidays_for_prophet()"]:::py_node
        P_apply_transformations["_apply_transformations()"]:::py_node
        P_apply_adstock["_apply_adstock()"]:::py_node
        P_geometric_adstock["_geometric_adstock()"]:::py_node
        P_weibull_adstock["_weibull_adstock()"]:::py_node
        P_apply_saturation["_apply_saturation()"]:::py_node
    end

    R_robyn_inputs --> |Partially Mapped| P_FeatureEngineering:::partially_mapped
    R_robyn_engineering --> |Partially Mapped| P_perform_feature_engineering:::partially_mapped
    R_prophet_decomp --> |Mapped| P_prophet_decomposition:::mapped
    R_fit_spend_exposure --> |Mapped| P_fit_spend_exposure:::mapped
    R_set_holidays --> |Partially Mapped| P_prepare_holidays_for_prophet:::partially_mapped

    P_FeatureEngineering --> P_perform_feature_engineering
    P_perform_feature_engineering --> P_prepare_data
    P_perform_feature_engineering --> P_create_rolling_window_data
    P_perform_feature_engineering --> P_calculate_media_cost_factor
    P_perform_feature_engineering --> P_run_models
    P_perform_feature_engineering --> P_prophet_decomposition

    P_run_models --> P_fit_spend_exposure

    P_prophet_decomposition --> P_prepare_holidays_for_prophet

    P_FeatureEngineering --> P_apply_transformations
    P_apply_transformations --> P_apply_adstock
    P_apply_transformations --> P_apply_saturation

    P_apply_adstock --> P_geometric_adstock
    P_apply_adstock --> P_weibull_adstock

    subgraph Missing ["Missing or Incomplete"]
        M_hyperparameters["Hyperparameters handling"]:::missing
        M_calibration["Calibration input handling"]:::missing
        M_dt_holidays["Full holiday data processing"]:::missing
        M_exposure_vars["Exposure variables handling"]:::missing
    end

    R_robyn_inputs --> M_hyperparameters
    R_robyn_inputs --> M_calibration
    R_set_holidays --> M_dt_holidays
    R_robyn_engineering --> M_exposure_vars
classDiagram
    class FeatureEngineering {
        +mmm_data: MMMData
        +hyperparameters: Hyperparameters
        +holidays_data: HolidaysData
        +__init__(mmm_data, hyperparameters, holidays_data)
        +perform_feature_engineering(quiet: bool) : FeaturizedMMMData
        -_prepare_data() : pd.DataFrame
        -_create_rolling_window_data(dt_transform: pd.DataFrame) : pd.DataFrame
        -_calculate_media_cost_factor(dt_input_roll_wind: pd.DataFrame) : pd.Series
        -_run_models(dt_modRollWind: pd.DataFrame, media_cost_factor: float) : Dict
        -_fit_spend_exposure(dt_modRollWind: pd.DataFrame, paid_media_var: str, media_cost_factor: float) : Dict
        -_prophet_decomposition(dt_mod: pd.DataFrame) : pd.DataFrame
        -_prepare_holidays_for_prophet(holidays_df: pd.DataFrame) : pd.DataFrame
        -_apply_transformations(x: pd.Series, params: ChannelHyperparameters) : pd.Series
        -_apply_adstock(x: pd.Series, params: ChannelHyperparameters) : pd.Series
        -_geometric_adstock(x: pd.Series, theta: float) : pd.Series
        -_weibull_adstock(x: pd.Series, shape: float, scale: float) : pd.Series
        -_apply_saturation(x: pd.Series, params: ChannelHyperparameters) : pd.Series
    }

    class FeaturizedMMMData {
        +dt_mod: pd.DataFrame
        +dt_modRollWind: pd.DataFrame
        +modNLS: Dict[str, Any]
    }

    class MMMData {
        +data: pd.DataFrame
        +mmmdata_spec: MMMDataSpec
    }

    class Hyperparameters {
        +hyperparameters: Dict[str, List[float]]
        +adstock: AdstockType
        +lambda_: float
        +train_size: List[float]
    }

    class HolidaysData {
        +dt_holidays: pd.DataFrame
        +prophet_vars: List[str]
        +prophet_country: str
        +prophet_signs: List[str]
    }

    class ChannelHyperparameters {
        +thetas: List[float]
        +shapes: List[float]
        +scales: List[float]
        +alphas: List[float]
        +gammas: List[float]
        +penalty: List[bool]
    }

    FeatureEngineering --> "1" MMMData : uses
    FeatureEngineering --> "1" Hyperparameters : uses
    FeatureEngineering --> "0..1" HolidaysData : uses
    FeatureEngineering ..> FeaturizedMMMData : creates
    FeatureEngineering ..> ChannelHyperparameters : uses
    Hyperparameters --> "*" ChannelHyperparameters : contains

We included a notebook tutorial2.ipynb testing and comparing the output with https://github.com/facebookexperimental/Robyn/blob/main/robyn_api/robyn_python_notebook.ipynb Please see tutorial2.ipynb:

image

related to #1047