Open ankitpeltoton opened 2 weeks ago
hello! wondering if there are any thoughts on this issue? Kindly let me know. Thanks!
Q1) Ranking is based on a combination (1) minimum detectable effect, (2) Scaled L2 imbalance (3) Absolute Lift in Zero. Essentially we want to find market combinations that detect as small a lift as possible, where the synthetic control method shows improvement over a raw simple average and does not detect lift when no lift is present (i.e. doesn't give false positives).
A higher Scaled L2 imbalance (close to 1) means there is little to no benefit with regards to using synthetic controls vs. just simply comparing a simple average of control to treatment markets. It doesn't necessarily mean that the markets are not useable. A lower Scaled L2 imbalance (close to 0) however does mean there is close to a perfect fit between treatment and control markets.
Q2) Power is a function of the number of simulations you run as set by the look back window parameter. If you run just one simulation your power (the number of simulations that detected lift when it was simulated) is either 0 or 100%. Typically we use 1 so that the code runs faster and then use a larger lookback window with the best combinations to ensure the results generalize over multiple simulations/time periods. This is using the GeoLiftPower function.
Q3) In addition to the three variables used to Rank, you can look at correlation, the % of historical conversions in treatment/holdout and dispersion of locations throughout your geography (this one would be manual).
Bug description
On running the GeoLift package with 80 markets, the model is giving like 1000 combinations among which Rank 1 shows that the AvgScaledL2Imbalance = 0.54 which is a very high value. I believe AvgScaledL2Imbalance is almost like the model fit and a value very close to 0 would indicate a good and unbiased model fit.
Q1) Need to understand how does the model gives this ranking? Like on what factors does the model gives the ranking? In my case
Session information
Please paste the output after running
sessionInfo()
in your R session.Reproduction steps
We have 3 columns in our data i.e. week, DMA and traffic. The traffic data is present across multiple DMAs at a weekly level. We are passing all this data to the GeoDataRead to get data in our required format. Once that is done passing it to the GeoLift package with the below arguments :
Expected behavior
I believed that the RANK 1 should be having a very less AvgScaledL2Imbalance value and would be reliable to pick those markets in order to get statistical significant results. But right now, I see the opposite trend in the model where initial ranks are having a AvgScaledL2Imbalance of high value and higher ranking options are having a AvgScaledL2Imbalance value of lowest. How can this happen? And how do we select the BEST markets among all options?
Additional context
No only need to understand how to select the best markets out of all these 1000 combinations. Suggestions would be really helpful.