facebookexperimental / Robyn

Robyn is an experimental, AI/ML-powered and open sourced Marketing Mix Modeling (MMM) package from Meta Marketing Science. Our mission is to democratise modeling knowledge, inspire the industry through innovation, reduce human bias in the modeling process & build a strong open source marketing science community.
https://facebookexperimental.github.io/Robyn/
MIT License
1.15k stars 343 forks source link

Initial Mean response and Optimized mean response issue #477

Closed virithavanama closed 2 years ago

virithavanama commented 2 years ago

Is there a way to convert the Initial Mean response from the budget allocator to match the original conversion values?

Leonelsentana commented 2 years ago

Please check my calcs done on the last comment here: https://github.com/facebookexperimental/Robyn/issues/227

virithavanama commented 2 years ago

Hi, @Leonelsentana Thanks for the response. The calculations you mentioned in that comment are for spend but I want the calculations for Initial mean response(conversions) to match to original conversions

Leonelsentana commented 2 years ago

So, this would be almost the same but changing to response instead of spend. Does that make sense? Taking the example from https://github.com/facebookexperimental/Robyn/issues/227 you would need to do the following using reallocated.csvand pareto_aggregated.csv:

[total_conversions (for a given channel) + ((optmResponseUnit - initResponseUnit) x non-zero spend time_units (non-zero spend weeks or days in your window for that channel), then divide by the total_conversions].

The total_conversions (for a given channel) can be calculated by pulling out data from pareto_aggregated.csv for the selected model, [total_conversions (for a given channel) = roi_total total_spend] and the total_conversions can be the sum of [roi_total total_spend] for all channels. The effect_share will also reflect the same relationship between [roi_total total_spend] / [sum (roi_total total_spend) for all channels]

Hope it helps!

virithavanama commented 2 years ago

From my data: Aggregated csv data: channel 1: Total_spend=1461776 Roi_total=0.0061 Total conversions_channel1=1461776x0.0061=5006.78 channel 2: Total_spend=120793.77 roi_total=0.0034 Total_conversions_channel2=120793.77x0.0034=732.23 Total_conversions=5006.78+732.23=5739.01 Reallocated csv data: optmresponseunit_channel1=45.71 initResponseUnit_channel1=44.63 (optmResponseUnit - initResponseUnit)_channel1=1.08 non-zero spend time_units(as of budget allocator) =51 weeks] Based on the formula [total_conversions (for a given channel) + ((optmResponseUnit - initResponseUnit) x non-zero spend time_units (non-zero spend weeks or days in your window for that channel), then divide by the total_conversions] => (5006.78+(1.08*51))/5739.01 = 0.88 what does this value represent? What I want is the original conversions for channel 1 which is around 10042

Leonelsentana commented 2 years ago

Hey, that is the share of conversions post optimization, if you need the total conversions just avoid dividing by total_conversions. Where did you get that 10042 from?


From: virithavanama @.> Sent: Thursday, September 1, 2022 3:50 PM To: facebookexperimental/Robyn @.> Cc: Leonel Sentana @.>; Mention @.> Subject: Re: [facebookexperimental/Robyn] Initial Mean response and Optimized mean response issue (Issue #477)

From my data: Aggregated csv data: channel 1: Total_spend=1461776 Roi_total=0. 0061 Total conversions_channel1=14617760. 0061=5006. 78 channel 2: Total_spend=120793. 77 roi_total=0. 0034 Total_conversions_channel2=120793. 770. 0034=732. 23 ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ZjQcmQRYFpfptBannerStart This Message Is From an External Sender

ZjQcmQRYFpfptBannerEnd

From my data: Aggregated csv data: channel 1: Total_spend=1461776 Roi_total=0.0061 Total conversions_channel1=14617760.0061=5006.78 channel 2: Total_spend=120793.77 roi_total=0.0034 Total_conversions_channel2=120793.770.0034=732.23 Total_conversions=5006.78+732.23=5739.01 Reallocated csv data: optmresponseunit_channel1=45.71 initResponseUnit_channel1=44.63 (optmResponseUnit - initResponseUnit)_channel1=1.08 non-zero spend time_units(as of budget allocator) =51 weeks] Based on the formula [total_conversions (for a given channel) + ((optmResponseUnit - initResponseUnit) x non-zero spend time_units (non-zero spend weeks or days in your window for that channel), then divide by the total_conversions] => (5006.78+(1.08*51))/5739.01 = 0.88 what does this value represent? What I want is the original conversions for channel 1 which is around 10042

— Reply to this email directly, view it on GitHubhttps://github.com/facebookexperimental/Robyn/issues/477#issuecomment-1234309575, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AQ2EOT52YYZ4GQ2S4YODN33V4CYARANCNFSM6AAAAAAQARDIBE. You are receiving this because you were mentioned.Message ID: @.***>

virithavanama commented 2 years ago

10042 is the total conversions for 51 weeks directly from the original data and the value without division is 5061.86 which is not matching. I think the missing conversions are because of trend/season etc?

Leonelsentana commented 2 years ago

How is it that you know that total conversions for a specific channel should be 10042 on your original input data, if MMM works with total non-attributed values for conversions? Please have in mind that your original data should not use any kind of attributed data, rather total spend, clicks, impressions, conversions, etc.

virithavanama commented 2 years ago

Hi the original data has the total spend and total conversions for each channel split. So based on this we know that that specific channel should have 100042 conversions

Screen Shot 2022-09-06 at 8 57 47 AM
Leonelsentana commented 2 years ago

Hi, is your original data the data you used as input to build the model? If so, this is not correct, marketing mix models are built based on total conversions. Your data should be total spend on channel A, B, C, D, etc. and then just one value for the response variable (Y) in this case conversions, which are the total conversions for your business, E.g. the ones you report in your ledgers, so just one column for conversions ok? Please avoid using any type of attributed data as an original input for MMM models. Attribution can serve as ground truth for calibration, however, the original input data must be total conversions (Y).

gufengzhou commented 2 years ago

Hi @virithavanama , there's a similar issue about this point, please check it here

without looking into the algebra here, Leo is right that you should at least think twice before you consider attributed conversions as source of truth. There're lots of content online about comparison between attribution & MMM. Definitely worth checking out. But if you conclude that you do trust attribution, then we're talking about the calibration feature here. You can use attributed result to guide robyn, see here

virithavanama commented 2 years ago

Hi Team, In the demo its mentioned "dep_var_type = "revenue", # "revenue" (ROI) or "conversion" (CPA)"" so we have used conversions as per demo. We get daily data from channels like Google ads (paid search), google DCM(paid display), etc. So, we know the total conversions for each channel individually. Then the data is transformed into weekly data and passed to the model. I have attached the screenshot of the sample data that is being passed to the model.

Screen Shot 2022-09-07 at 4 02 31 PM

paid_media_spends and paid_media_vars are the spend and impressions per channel and dep_var is the total conversions

Leonelsentana commented 2 years ago

Hi @virithavanama I presume that when you say "we know the total conversions for each channel individually" you are applying some sort of attribution logic there to indicate which conversions correspond to which channel, correct? MMM works in a different way by looking into the time series correlation of the total Impressions or total spend data points in time from each channel with the total conversions (not on a channel level, just business's total). Does that make sense? You are comparing attributed arbitrary data based on just a rule of clicks or impressions and time between those and the conversion, which does not imply the same methodology that is applied on MMM.

virithavanama commented 2 years ago

We are not applying any logic..the data directly comes from different channels like Google ads which are for paid search so we need the total conversions. There should be logic to convert the initial mean response to business conversions right?

Leonelsentana commented 2 years ago

The logic is the one I explained above, avoiding to divide by total_conversions https://github.com/facebookexperimental/Robyn/issues/477#issuecomment-1234002864. If you want the decomposition per day you can also get it directly from pareto_alldecomp_matrix. Each column will show each of the channels and other variables contributions to the explained Y (depVarHat)

gufengzhou commented 2 years ago

Please reopen if this issue reoccurs.