EnergyInnovation / eps-us

Energy Policy Simulator - United States
GNU General Public License v3.0
22 stars 7 forks source link

Add modeling structure to support sub-region detail within a single region model #166

Closed robbieorvis closed 3 years ago

robbieorvis commented 3 years ago

This is more of a thought exercise, but I've been thinking about what structural/data requirements would be required to downscale the EPS modeling to a sub-regional level with a single region model, e.g. having a breakdown by 50 US states within the national model. One big concern of doing this would be model run time... adding potentially 50 additional subscripts to many of the variables would very likely lengthen the model runtime significantly, but there's a question about whether it might be worth it if the runtime is within reason and it would add much desired clarity. For background, I've now heard in multiple meetings with policymakers and national scale modelers that this type of detail is desperately needed to grow the importance of models in decision making and deliver the insights that are needed. It might also preclude the need to independently build new subregional models (e.g. instead of building 50 US state models we would build one great national model; same for China, and the EU model could be converted to this format). Perhaps on this thread we can brainstorm about what would be required and revisit at some point down the road if we still think it's of interest.

In terms of needs:

1) We would need to add a geographical subscript, something like Subregion1, Subregion2...SubregionN. These would be defined (in terms of what regions they map onto) in the input data files and also in the WebAppData doc. Regions with less than the max number of subscripts (ie. less than 50 subregions, for example) would just have zeroes for the extra regions. I can imagine a new excel file in InputData that contains the region mapping. 2) We would need data to downscale energy and service demand requirements. I think this is relatively straightforward using the great work RMI has done. Time-series files might need a new structure (like BCDTRtSY) to allow for adding an 2nd dimension to the input data. 3) We would ideally need a multi-region electricity variable to account for the fact that there is significant inter-regional trading/dependency in the electricity sector. This could simply be a mapping of Subregions to ElectricityRegions 4) I can imagine that many variables would need regional multipliers (e.g. technology shareweights). Allocation mechanisms would need to account for regional availability of certain technologies. 5) The IO model piece might be not so bad, except that determining the domestic content share would be a little challenging. However, our work with RMI means we can pretty easily obtain these values. 6) One of the biggest question is how to handle inter-state trade and flows. This ranges from energy production and flows to the IO model and domestic content share and flows. This might be the single greatest challenge. 7) What do we do about policies? Do we allow for subregion policies to be defined? Probably so, in which case this adds a lot to the policy implementation schedule files as well as subscripting to every single policy lever. That's a lot, but, Todd's new subscript setting feature would allow national level values to be set easily.

I am almost certainly missing a lot, so feel free to add more here, but thought it would be fun to think through this a little bit, even if we don't end up pursuing it, which is likely.

robbieorvis commented 3 years ago

One other comment: PERAC and BPEiC would need to be downscaled and it sounds like EPA is releasing state level for the US at some point this fall, so we would have that.

jrissman commented 3 years ago

It's hard to anticipate all the things this would entail until we did it, and ran into issues. It would be such a large lift.

Here are a couple that come to mind:

Another big problem with this approach is we often want different partner organizations for different state models, and we benefit from each state model having its own branding and identity in terms of communications and outreach. We would lose the ability to separately brand each state model with its own image, its own partner logos, its own URL, its own everything. That could severely damage our ability to make the case to policymakers in a specific state that we've built a tool just for them and that reflects their state's uniqueness. It could be a huge setback to our ability to be trusted by state-level policymakers.

I guess I don't see the benefit here. What we have right now is similar to the end goal (models for the main region and sub-regions), but with the benefits that we aren't forced to build all sub-regions at once (though we can if we want to), the model user isn't forced to run all the sub-regions (which has severe impacts on our technical infrastructure as described above), and we have flexibility of branding and promotion that helps with outreach on a state-by-state basis. Essentially, we've already solved the problems of too much input data or too much bandwidth by segmenting the sub-regions into different EPS models. Having multiple EPS models is likely the best possible solution. Even discounting the effort involved, merging them into a single model might be a step backward, rather than a benefit.

So I guess there is no value add unless the sub-regions interact in some way. And whatever interaction you're envisioning between the sub-regions would have to be of staggering value to be worth the overwhelming downsides. I honestly don't see it.

Here's a different way to approach the question. We're going to have EPS models for most or all U.S. states. If you're trying to do a cross-state analysis, what if you just programmed a Python script that runs all 50 state models (or any user-selected subset) and outputs the data you need to a single spreadsheet file? The existing Data Logging Script could be extended to provide a place for you to specify a list of EPS repository names (all of which you'd already need to have cloned to your computer, which means the public can't access work-in-progress regions that haven't yet been released). The script can run the same scenario settings across all models. It would then compile the results across all sub-regions into a single spreadsheet for you, with a new column to specify which model the results came from. (You could then continue the analysis in Python or Excel as you prefer.) This wouldn't even be that hard - maybe a day or two of work on enhancing the Data Logging script. This type of approach - using simple tools like Python scripts to perform the same task across multiple existing models - is probably the right way to go forward, insofar as you need to do things that draw from multiple different EPS regions.