Estimated lift much higher than actual lift in dataset.

djolear commented 1 year ago

Hi folks,

I'm wondering if you might be able to help me out with a puzzling issue that I'm running into using the GeoLift package.

I'm currently designing a test using this package. Based on the results of the GeoLiftMarketSelection, I'm creating a simulated dataset, where I take the effective size provided in the function output and apply it to the test markets for the duration provided by the function output. For example, if the first market selection is for markets 1 & 2, with an effect size of 0.05 for a 15 day test, I apply a lift of 0.05 to my test markets data to simulate this lift.

When I run the GeoLift function on this test data, the reported percent lift for these tests is often much higher than the lift that I simulated. For example, when I simulate a lift of 5%, I might get back a lift of 14.8%. I'm wondering if you might have any ideas as to why this is happening?

I unfortunately can't share my data, but I did notice that when I apply this same procedure to the datasets provided in the GeoLift walkthrough, I don't run into the same problem. In other words, the lift that I simulate is more or less the one that is returned by the GeoLift function.

Here are a few things that I've noticed or that I want to note about the dataset that I'm using:

In the output of the GeoLiftMarketSelection function, the Average_MDE is often higher than the EffectSize. For example, the EffectSize might be 0.04 but the Average_MDE is close to .08.
The abs_lift_in_zero is often higher than 0, ranging somewhere between 0.02 and 0.04.
In the dataset that I'm using, we have a relatively low number of conversions (esp. compared to the sample data provided in the walkthrough). For example, the highest number of conversions per day in any location is less than 200.
We've already moved all the way up the funnel for this test, so we likely won't be able to use an outcome metric that has a higher volume of data.
The dataset that I'm using contains one year worth of data and the test sizes that I'm simulating range from 2 weeks to 8 weeks, so I think we have ample historical data and a fairly long test window.

I'm wondering if you have any ideas about why I might be obtaining these results and whether there is anything you recommend doing to obtain more robust results?

Here are some things I've thought of and I'm wondering if you think they might be valid approaches (but would be curious to know if you have other ideas):

Aggregate DMAs to obtain a higher volume of conversions per day.
Aggregate days into weeks to obtain a higher volume of conversions per time point.

Related to the above, I'm wondering how much cause for concern there should be when

The EffectSize and the Average_MDE diverge substantially in the GeoLiftMarketSelection
When the abs_lift_in_zero is significantly above 0 And as with the above questions, what steps you might recommend for addressing these issues.

Thanks for any help here!

raphaeltamaki commented 1 year ago

@djolear Hi, would you mind sharing the code that you used to simulate the problem you had but using the the Walkthrough data? While your explanation is clear, the devil lies in the details with code

raphaeltamaki commented 1 year ago

@djolear I'm afraid I can't reproduce the error that you had with your dataset, as a consequence, I will close this issue but please re-open it if you manage to create a dataset we can use to test out the error.

Regarding your questions

Aggregate DMAs to obtain a higher volume of conversions per day. What does DMA stand for? From the text I suppose it is 'locations', and if so I don't think it will help out aggregating locations. Grouping locations is equivalent to getting the weighted average of the coefficients when training with the locations separated. The only situation where I can imagine that it would be useful is

if you have a few locations with a lot of conversions/users and many with very few users, so that the regularization in the Augmented Synthetic Control basically zeroes the coefficients for all location but those few that are large
if you have many locations where the number of conversions can be 0 in some of the dates, then grouping locations to ensure that the group never has a date with 0 users. The difference to the previous point is that if a location has 0 events on a date it causes a missing date for the location, which makes it get dropped out when we use GeoReadData

Aggregating dates dates will again just contribute in the same situations as the two explained above Same answer as above

_The EffectSize and the AverageMDE diverge substantially in the GeoLiftMarketSelection Yes, that shouldn't happen. If this happens, then the algorithm is not performing well for your data, which likely means that some of the assumptions for the correct estimation of your lift aren't satisfied

_When the abs_lift_inzero is significantly above 0 If this happens then again your data is not satisfying at least one of the assumptions necessary when using Augmented Synthetic Control. It means that even when we force a lift of 0, we still detect a lift which 'feels above' 0, which can lead to a false positive situation where we state the existence of an impact by the treatment even when there is none

facebookincubator / GeoLift

Estimated lift much higher than actual lift in dataset. #132