facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.16k stars 344 forks source link

feature: add code for calibration #1084

Closed alxlyj closed 1 month ago

alxlyj commented 1 month ago

Project Robyn

As titled. We are adding the necessary interfaces and implementation for calibration component. We also integrated the new addition from calibration to modeling. These are also integrated and tested on a tutorial notebook.

Test Plan

Code flow sequence diagram on R side:

flowchart TD
    A[Start robyn_calibrate] --> B{Check if calibration_input exists<br/>and study is within window}
    B -->|No calibration_input or<br/>outside window| C[Warning: Running without calibration]
    B -->|Valid input and<br/>within window| D[Initialize calibration_input<br/>with NA columns]

    D --> E[Process each study in split_channels]

    subgraph study_loop[Study Processing Loop]
        E --> F[Get channels and parameters<br/>for current study]
        F --> G[Determine study position<br/>and calibration dates]
        G --> H[Process each channel]

        subgraph channel_loop[Channel Processing Loop]
            H --> I{Check calibration scope}

            I -->|immediate| J[Process immediate scope]
            J --> K1[Apply Adstock transformation]
            K1 --> L1{Check adstock type}
            L1 -->|geometric| M1[Apply geometric<br/>transformation]
            L1 -->|weibull| M2[Apply weibull<br/>transformation]
            M1 --> N1[Calculate immediate<br/>and total effects]
            M2 --> N1
            N1 --> O1[Apply Saturation<br/>Hill transformation]
            O1 --> P1[Calculate decomposition<br/>for immediate scope]

            I -->|total| J2[Process total scope]
            J2 --> P2[Get direct decomposition<br/>from xDecompVec]
        end

        P1 --> Q[Combine multiple<br/>channel results]
        P2 --> Q
        Q --> R[Update calibration_input<br/>with predictions]
    end

    R --> S[Calculate final metrics]
    S --> T[Generate liftCollect with<br/>scaled decomposition values]
    T --> U[Return liftCollect]

    subgraph calculations[Key Calculations]
        V1[mape_lift = decompAbsScaled - liftAbs / liftAbs]
        V2[calibrated_pct = decompAbsScaled / decompAbsTotalScaled]
        V3[decompAbsScaled = pred / decompDays * liftDays]
    end

    style study_loop fill:#f0f0f0,stroke:#333,stroke-width:2px
    style channel_loop fill:#e6e6e6,stroke:#666,stroke-width:2px
    style calculations fill:#e6f3ff,stroke:#0066cc,stroke-width:2px

Tutorial notebook results with calibration:

Calibration input:

image

Calibration Results:

image

Accuracy of calibration will be vetted in a future PR.

image

alxlyj commented 1 month ago

for reference, this is what we are seeing for calibration input from the R side:

image
alxlyj commented 1 month ago

now supports combination of channel in calibration input itself. For example - currently calibration_input is Dict[channel:, channelCalibrationData]. If we can have Dict[List[channel], channelCalibrationData].

i.e.: calibration_data = { ["facebook"]: ChannelCalibrationData( # Single channel as a list lift_start_date=pd.Timestamp("2023-01-01"), lift_end_date=pd.Timestamp("2023-01-31"), lift_abs=1000, spend=5000 ), ["search", "tv"]: ChannelCalibrationData( # Multiple channels as a list lift_start_date=pd.Timestamp("2023-02-01"), lift_end_date=pd.Timestamp("2023-02-28"), lift_abs=2000, spend=8000 ) } calibration_input = CalibrationInput(channel_data=calibration_data)