Difference in Differences (DiD) methods using control groups for better savings estimates

AJLutz commented 5 years ago

Issue: Difference in Differences (DiD) methods using control groups can yield more accurate savings estimates than normalized pre and post intervention comparisons of participant groups only.

Prerequisites: This issue is in the GRID working group. Issues previously opened in the CalTrack working group are only tangential to the formulation of control groups [exogenous effects (#1050, accuracy (#73), zip code mapping (#65, # 26) and non- routine events (#84)].

Description: The proposed update would require consideration of appropriate control groups and their use when valid control groups can be developed.

Using before and after consumption normalized for weather (and possibly other variables) provide estimates of total savings for the participant group. This includes gross impacts, net impacts and ‘exogenous’ impacts due to the economy, societal messaging, customer behavior and other events. These exogenous effects can be captured using a control group with DiD mthods, which can help isolate those net (free ridership) impacts and provide more reliable net to gross (NTG) impacts for specific interventions. NTG data gathering can then concentrate on more definitive and discrete events as to whether the control group member implemented a specific intervention outside of the program, replacing more ambiguous and costly analysis methods. Note that if the program intervention was implemented by a control group member during the pre or post period, net impacts could be slightly underestimated as the potential implementing non-participant pool is smaller (exogenous effects would not be affected). Methods for matching participants can be standardized based on available information.

Matching criteria can include zip codes, groups of zip codes, county, climate zones, population density (large urban/small urban/suburban, exurban, rural designation), household size, income levels, building square footage, building type (SF/MF/DMo), ownership (own/rent), condo flag, change in service ID during pre or post analysis period, and daily/monthly energy use profiles (as a proxy for some of the aforementioned criteria). Control group members would be dropped upon change in Service ID, major other EE/RE/electrification measures, or participation in other correlated EE programs during the pre or post period.

Proposed Test Methodology 1) Develop minimum population of recent program participants (2016/2017, residential, similar characteristics from description above). Smaller sample sizes (minimum 100) are allowed for test phase. For testing purposes, size can be small and restricted to one zip code and building type, for instance. 2) Find matching sample from non-participating customers. Verify no change in ownership or major energy using equipment (including adoption of incented technologies). 3) Use the same weather adjustments for control group as applied to participant group (or other appropriate identical weather adjustments for both groups). 4) Test that regression models for the participant group function for the control group (+/-10%; include r squared and CVRSME at a minimum). 5) Check proportion of savings from participants and control groups. 6) Make adjustments based on findings until successful for small test populations. Expend to large populations (segmented as needed). Test regression models.

Acceptance Criteria If regression model results are similar for the larger set of participants, accept the use of control groups to account for both net impacts (free ridership) and exogenous effects. Gross impacts for the participants can be adjusted upwards or downwards based on the results from the control group. (Findings should be presented with program reporting to the CPUC for more accurate and refined net to gross analysis).

mcgeeyoung commented 5 years ago

OpenEE would be supportive of including standardization of Comparison Group selection methods as part of the scope of the GRID working group. We've done some preliminary work on this front in partnership with the Energy Trust of Oregon (https://www.energytrust.org/wp-content/uploads/2018/11/OpenEE-Technical-Report-Comparison-group-identification-methods-FINAL-wSR.pdf) and would be interested in further exploration.

mgwyman commented 5 years ago

Energy Trust supports further investigation. This comment is a vote for this issue

AndrewYRoyal commented 5 years ago

This looks like a good topic to pursue. I like opnee/recurve's out-of-sample testing method.

The current proposal could set out to define test-statistic thresholds that indicate a good match. Whether a test statistic indicates a "good match" would be determined by similarity out-of-sample consumption patterns of the matched sites (as in recurve's technical paper).

The test statistics could include standardized distance in covariates (e.g. euclidian or mahalanobis distance in monthly consumption); or the extent of shared support in distributions of covariates and propensity scores. Focusing on test statistics means that the guidelines could be method agnostic-- the evaluator could use any of the dozens of matching methods available as long as the match satisfies the test-statistic threshold.

energy-market-methods / GRID

Difference in Differences (DiD) methods using control groups for better savings estimates #1