Add modeling structure to support sub-region detail within a single region model

This is more of a thought exercise, but I've been thinking about what structural/data requirements would be required to downscale the EPS modeling to a sub-regional level with a single region model, e.g. having a breakdown by 50 US states within the national model. One big concern of doing this would be model run time... adding potentially 50 additional subscripts to many of the variables would very likely lengthen the model runtime significantly, but there's a question about whether it might be worth it if the runtime is within reason and it would add much desired clarity. For background, I've now heard in multiple meetings with policymakers and national scale modelers that this type of detail is desperately needed to grow the importance of models in decision making and deliver the insights that are needed. It might also preclude the need to independently build new subregional models (e.g. instead of building 50 US state models we would build one great national model; same for China, and the EU model could be converted to this format). Perhaps on this thread we can brainstorm about what would be required and revisit at some point down the road if we still think it's of interest.

In terms of needs:

1) We would need to add a geographical subscript, something like Subregion1, Subregion2...SubregionN. These would be defined (in terms of what regions they map onto) in the input data files and also in the WebAppData doc. Regions with less than the max number of subscripts (ie. less than 50 subregions, for example) would just have zeroes for the extra regions. I can imagine a new excel file in InputData that contains the region mapping. 2) We would need data to downscale energy and service demand requirements. I think this is relatively straightforward using the great work RMI has done. Time-series files might need a new structure (like BCDTRtSY) to allow for adding an 2nd dimension to the input data. 3) We would ideally need a multi-region electricity variable to account for the fact that there is significant inter-regional trading/dependency in the electricity sector. This could simply be a mapping of Subregions to ElectricityRegions 4) I can imagine that many variables would need regional multipliers (e.g. technology shareweights). Allocation mechanisms would need to account for regional availability of certain technologies. 5) The IO model piece might be not so bad, except that determining the domestic content share would be a little challenging. However, our work with RMI means we can pretty easily obtain these values. 6) One of the biggest question is how to handle inter-state trade and flows. This ranges from energy production and flows to the IO model and domestic content share and flows. This might be the single greatest challenge. 7) What do we do about policies? Do we allow for subregion policies to be defined? Probably so, in which case this adds a lot to the policy implementation schedule files as well as subscripting to every single policy lever. That's a lot, but, Todd's new subscript setting feature would allow national level values to be set easily.

I am almost certainly missing a lot, so feel free to add more here, but thought it would be fun to think through this a little bit, even if we don't end up pursuing it, which is likely.

It's hard to anticipate all the things this would entail until we did it, and ran into issues. It would be such a large lift.

Here are a couple that come to mind:

We likely would have to stop taking in input data from CSV files and move to a better data management system. Already we have hundreds of CSV files, and some variables are already loaded from many CSV files (e.g. a time-series variable with two subscripts needs to be split across multiple files today). Multiplying this by 50x would be kind of crazy. It's not what CSV files and folders are meant to manage. We would need to migrate to a technology properly intended and designed for this type of data management and querying - i.e. a database. The web app already stores and pulls everything from a SQL database, only accessing the CSV files once during the build process. We'd need to do something similar on desktop, but it would be harder, as we'd need to develop ways to get the data into the database into the first place. Maybe we'd be looking at replacing the Excel files with Python or something. I don't even know.
We'd need to multiply the current number of output graphs by 51. Right now, each graph is individually defined in GraphDefinitions and in the OutputGraphs tab of WebAppData, but there would be too many for this to be practical. We'd need a programmatic way to extend the graphs to subregions. It would be possible to create a Python script that writes a Vensim GraphDefinitions file - I could manage that. I don't know if Vensim would choke at some point trying to load so many graph definitions. And we'd need many additional pages in Vensim to show the graphs - that's probably impractical. We probably would instead decide to not show any sub-region graphs within Vensim, and just rely on the web tool for it. This not only requires a third-tier graph selection menu, but a new data querying paradigm, as noted in the following bullet:
The web tool loads results for all output graphs into the browser when a run is complete. Multiplying the amount of data by 51 would probably make the data too big for this approach (i.e. either exceeding the limits the browser puts on how much memory any single tab can be consuming, or making the data transfer too slow and bandwidth-intensive, or both). Todd would need to program a new method that just queries the server for the data for the graph the user asks to display. This would add latency to every operation that shows a graph, which would hurt the responsiveness and fast "feel" of the web tool. It also adds server demands (and makes the idea of a WASM conversion impossible, as that relies on running the model locally in the browser).
On the model side, I think if we're essentially running 51 independent models at once, there should not be any theoretical challenges that are greater than the challenge of building 51 separate models. That is, on your point (6) ("how to handle inter-state trade and flows"), we decide not to handle them at all. Each state simply continues to consider all other states to be out-of-region and we don't worry or care about which state receives what from which other state. This applies to electricity too - we would not be implementing any "electricity regions" like you suggested in (3). Note that the sum of the results of the 50 states would not equal the national results precisely for any given variable, as it's just 51 independent models (that happen to share certain input data) being run simultaneously.
On the other hand, if you mean for the regions to interact in some way, you'd need to describe the sort of interaction you envision in considerably more detail for me to have any sense of the work involved. Interaction between sub-regions might have implications for data needs, because it may require reasonable values to be present for all regions. So in China, we would have to have data in the model for all provinces, not just a few. (On the other hand, if the sub-region models are essentially running independently, then it's okay to omit as many sub-regions as you like.) I'm not sure if you would want to commit to finding and implementing data for every sub-region within the larger region. And I don't even understand what benefits you'd be seeking by having the sub-regions interact in some way. Realistically, I think we'd only really be considering 51 non-interacting models running at once.

Another big problem with this approach is we often want different partner organizations for different state models, and we benefit from each state model having its own branding and identity in terms of communications and outreach. We would lose the ability to separately brand each state model with its own image, its own partner logos, its own URL, its own everything. That could severely damage our ability to make the case to policymakers in a specific state that we've built a tool just for them and that reflects their state's uniqueness. It could be a huge setback to our ability to be trusted by state-level policymakers.

I guess I don't see the benefit here. What we have right now is similar to the end goal (models for the main region and sub-regions), but with the benefits that we aren't forced to build all sub-regions at once (though we can if we want to), the model user isn't forced to run all the sub-regions (which has severe impacts on our technical infrastructure as described above), and we have flexibility of branding and promotion that helps with outreach on a state-by-state basis. Essentially, we've already solved the problems of too much input data or too much bandwidth by segmenting the sub-regions into different EPS models. Having multiple EPS models is likely the best possible solution. Even discounting the effort involved, merging them into a single model might be a step backward, rather than a benefit.

So I guess there is no value add unless the sub-regions interact in some way. And whatever interaction you're envisioning between the sub-regions would have to be of staggering value to be worth the overwhelming downsides. I honestly don't see it.

Here's a different way to approach the question. We're going to have EPS models for most or all U.S. states. If you're trying to do a cross-state analysis, what if you just programmed a Python script that runs all 50 state models (or any user-selected subset) and outputs the data you need to a single spreadsheet file? The existing Data Logging Script could be extended to provide a place for you to specify a list of EPS repository names (all of which you'd already need to have cloned to your computer, which means the public can't access work-in-progress regions that haven't yet been released). The script can run the same scenario settings across all models. It would then compile the results across all sub-regions into a single spreadsheet for you, with a new column to specify which model the results came from. (You could then continue the analysis in Python or Excel as you prefer.) This wouldn't even be that hard - maybe a day or two of work on enhancing the Data Logging script. This type of approach - using simple tools like Python scripts to perform the same task across multiple existing models - is probably the right way to go forward, insofar as you need to do things that draw from multiple different EPS regions.

EnergyInnovation / eps-us

Add modeling structure to support sub-region detail within a single region model #166