facebookincubator / GeoLift

GeoLift is an end-to-end geo-experimental methodology based on Synthetic Control Methods used to measure the true incremental effect (Lift) of ad campaign.
https://facebookincubator.github.io/GeoLift/
MIT License
182 stars 55 forks source link

Estimated lift much higher than actual lift in dataset. #132

Closed djolear closed 1 year ago

djolear commented 1 year ago

Hi folks,

I'm wondering if you might be able to help me out with a puzzling issue that I'm running into using the GeoLift package.

I'm currently designing a test using this package. Based on the results of the GeoLiftMarketSelection, I'm creating a simulated dataset, where I take the effective size provided in the function output and apply it to the test markets for the duration provided by the function output. For example, if the first market selection is for markets 1 & 2, with an effect size of 0.05 for a 15 day test, I apply a lift of 0.05 to my test markets data to simulate this lift.

When I run the GeoLift function on this test data, the reported percent lift for these tests is often much higher than the lift that I simulated. For example, when I simulate a lift of 5%, I might get back a lift of 14.8%. I'm wondering if you might have any ideas as to why this is happening?

I unfortunately can't share my data, but I did notice that when I apply this same procedure to the datasets provided in the GeoLift walkthrough, I don't run into the same problem. In other words, the lift that I simulate is more or less the one that is returned by the GeoLift function.

Here are a few things that I've noticed or that I want to note about the dataset that I'm using:

I'm wondering if you have any ideas about why I might be obtaining these results and whether there is anything you recommend doing to obtain more robust results?

Here are some things I've thought of and I'm wondering if you think they might be valid approaches (but would be curious to know if you have other ideas):

Related to the above, I'm wondering how much cause for concern there should be when

Thanks for any help here!

raphaeltamaki commented 1 year ago

@djolear Hi, would you mind sharing the code that you used to simulate the problem you had but using the the Walkthrough data? While your explanation is clear, the devil lies in the details with code

raphaeltamaki commented 1 year ago

@djolear I'm afraid I can't reproduce the error that you had with your dataset, as a consequence, I will close this issue but please re-open it if you manage to create a dataset we can use to test out the error.

Regarding your questions

Aggregate DMAs to obtain a higher volume of conversions per day. What does DMA stand for? From the text I suppose it is 'locations', and if so I don't think it will help out aggregating locations. Grouping locations is equivalent to getting the weighted average of the coefficients when training with the locations separated. The only situation where I can imagine that it would be useful is

  1. if you have a few locations with a lot of conversions/users and many with very few users, so that the regularization in the Augmented Synthetic Control basically zeroes the coefficients for all location but those few that are large
  2. if you have many locations where the number of conversions can be 0 in some of the dates, then grouping locations to ensure that the group never has a date with 0 users. The difference to the previous point is that if a location has 0 events on a date it causes a missing date for the location, which makes it get dropped out when we use GeoReadData

Aggregating dates dates will again just contribute in the same situations as the two explained above Same answer as above

_The EffectSize and the AverageMDE diverge substantially in the GeoLiftMarketSelection Yes, that shouldn't happen. If this happens, then the algorithm is not performing well for your data, which likely means that some of the assumptions for the correct estimation of your lift aren't satisfied

_When the abs_lift_inzero is significantly above 0 If this happens then again your data is not satisfying at least one of the assumptions necessary when using Augmented Synthetic Control. It means that even when we force a lift of 0, we still detect a lift which 'feels above' 0, which can lead to a false positive situation where we state the existence of an impact by the treatment even when there is none