facebookincubator / GeoLift

GeoLift is an end-to-end geo-experimental methodology based on Synthetic Control Methods used to measure the true incremental effect (Lift) of ad campaign.
https://facebookincubator.github.io/GeoLift/
MIT License
175 stars 54 forks source link

How to get predicted values by test market? #150

Closed Snowcatcat closed 1 year ago

Snowcatcat commented 1 year ago

In the walkthrough example, it seems that there are 2 ways to get the predicted values:

  1. predicted <-as.data.frame(GeoTest$y_hat)
  2. test <- as.data.frame(plot(GeoTest)$data$t_obs)

The first line of the code only shows the values for the first market. The second line of the code only shows the values for the entire test group as a whole.

Is there a way to get predicted values by test market?

raphaeltamaki commented 1 year ago

Hi,

I think there is no way to get the predicted values by test market, at least as such the predictions for each market are different. From what I see on the Augmented Synthetic Control paper (which is the base for the library) and on the code, all countries in the test are bundles together and modelled as a single unit. As such, the model doesn't predict the values per market, but instead for the average (y_hat) or the whole (data$t_obs = y_hat * n_countries).

What I would do to get the predicted values by market is to run the function once per market, while remembering to remove the other treated markets from the dataset. Below is an example based on the Walkthrough


data(GeoLift_Test)
treated_locations = c("chicago", "portland")
output = NULL
for (treated_location in treated_locations) {

  filtered_GeoLift_Test = GeoLift_Test %>%
    filter((location == treated_location) | !(location %in% treated_locations))

  GeoTestData_Test <- GeoDataRead(data = filtered_GeoLift_Test,
                                  date_id = "date",
                                  location_id = "location",
                                  Y_id = "Y",
                                  X = c(), #empty list as we have no covariates
                                  format = "yyyy-mm-dd",
                                  summary = TRUE)
  GeoTest <- GeoLift(Y_id = "Y",
                     data = GeoTestData_Test,
                     locations = c(treated_location),
                     treatment_start_time = 91,
                     treatment_end_time = 105)

  if (is.null(output)){
    output = GeoTest$y_hat
  }else{
    output = cbind(output, GeoTest$y_hat)
  }

}
output = as.data.frame(output)
names(output) <- treated_locations
output$total = apply(output, 1, function(x) sum(x[1:length(treated_locations)]))

GeoTestData_Test <- GeoDataRead(data = GeoLift_Test,
                                date_id = "date",
                                location_id = "location",
                                Y_id = "Y",
                                X = c(), #empty list as we have no covariates
                                format = "yyyy-mm-dd",
                                summary = TRUE)
GeoTest <- GeoLift(Y_id = "Y",
                   data = GeoTestData_Test,
                   locations = treated_locations,
                   treatment_start_time = 91,
                   treatment_end_time = 105)

output$grouped_result = GeoTest$y_hat * 2 # because y_hat is the average of the 2 locations