Use more computed values in IO model

EnergyInnovation / eps-us

Energy Policy Simulator - United States

GNU General Public License v3.0

22 stars 7 forks source link

Use more computed values in IO model #236

Closed robbieorvis closed 1 year ago

robbieorvis commented 2 years ago

In working on the CA EPS, we've noted several instances where the model could/should use calculated data in place of input data, in particular where the data being read is duplicative. This includes:

1) BGDP BAU Gross Domestic Product: This data is (in theory) equivalent to the sum of all value added calculated in the model. We should use a sum of total value added to calculate the GDP in the model instead of reading it in. This has two benefits. First, it reduces input data requirements. Second it serves as a check on our value added data to make sure it is reasonable, and it ensures that GDP related calculations and estimates are aligned with other data affecting value added as calculated in the model.

2) BPCiObIC BAU Percent Change in Output by ISIC Code, for fuel production industries: Another challenge that arose is ensuring that projected output by ISIC code for fuel industries is aligned with the domestic fuel production of those industries. But we should be able to calculate this in the model, since output by industry should be the sum of products (e.g. natural gas and petroleum liquids) and the value of those products, and we have both domestic fuel prices by industry as well as international market prices by fuel type, either one of which could be used (and it's important to note here that we may need to introduce a wholesale energy price and then have adders for sector specific prices, since delivered energy costs have non-energy production related costs, such as transmission lines or utility programs, included in prices). Regardless, we should better link the projected output for fuel industries to the calculated fuel production in the model.

3) BEbIC BAU Employment by ISIC Code, for fuel production industries: Like with output by ISIC code, future employment for fuel production industries should be tied to the energy output of those industries, not the dollar output (both of which are/could be computed in the model as described above). The reason for this is that international market prices can change a lot (consider the current oil and gas price situation), but without meaningful impact to production of energy, and in turn impact to employment. As such, it makes more sense to tie changes in employment for these industries to physical output, instead of $ output. An easy way to handle this in Vensim is to use a modified change in output for these industries that recalculates the the change in output using a fixed energy price (which essentially just means it becomes a multiple of production).

All of these would represent improvements on the existing IO model structure and bring improved accuracy to employment calculations.

jrissman commented 1 year ago

Here are thoughts on each of these three items, based on Megan and my review.

We do not agree with this one. Our BAU Value Added input data is not time series. We only obtain a time series of value added data by scaling it by our time series output by ISIC code. This is rough and less accurate than our input GDP projections. Also, the value added numbers are later scaled to match the GDP projection, so it doesn't really hurt much if the value added numbers do not add perfectly to the GDP projection. Given the GDP projection is more accurate, we prefer to stick with the current approach.
We agree a change is needed here. It sounds like three things are being taken as input data: BAU fuel production/sales, BAU fuel prices, and BAU output (revenue) by fuel industry ISIC codes. It often can be impossible to align the data sources when all of them are input data, so it generally is best to take in all but one of them as input data and calculate the last one. Which one to calculate should be based on which one has the least reliable input data and/or where that data source is not used for related purposes that could become misaligned in some other way. We agree that revenue is likely less reliable than price or production/sales projections (particularly because revenue is so sensitive to the other two), so we probably can go ahead and calculate that one, as you suggest.
We agree and will scale the BAU employment in fuel production industries based on the energy output of the fuel industries rather than the economic ($) output.

mkmahajan commented 1 year ago

Robbie and I chatted about #1 above briefly before Jeff rejoined our group call. His concern is that BAU Value Added is not scaled by GDP before it flows through the IO model and is used to calculate the change in Value Added. Ideally, we'd align BGDP and BAU Value Added upstream of that so that we increase accuracy.

jrissman commented 1 year ago

I've been carefully reviewing the EPS and working on Item 3 from this issue today. I've realized that we already attempt to make the requested adjustment with respect to jobs (but we don't make the adjustment for employee compensation). We had forgotten that we already built this feature. Here's a screenshot of the place where we calculate the change in fuel prices caused by the policy package, which includes both domestically-used fuel and exported fuel. We then map it onto ISIC codes for use in the I/O model, accounting for the fact that some ISIC codes produce fuels and non-fuel products.

FuelPriceChanges

Then in the I/O model, we adjust the variable Within Industry Jobs per Unit Output to counteract the influence of fuel price changes by changing the jobs required per unit revenue to match the fuel price fluctuations:

WithinIndustryJobsPerUnitOutput

Subsequently, Within Industry Jobs per Unit Output goes on to affect the "Requirements" variables used for indirect and induced jobs, so the adjustment here covers all three types of modeled job gains/losses.

We don't currently adjust EE compensation in this way, but perhaps we should, since (like employment), salaries don't fluctuate wildly from year to year based on commodity prices. But aside from potentially applying this adjustment to EE compensation, I'm not seeing anything to do here for item 3 that we haven't already done.

Given that this feature is already part of the model, this issue is actually a bug report, not a feature request. If you think the feature is not working right, would it be possible for you to specify scenario settings that let me reproduce the issue you encountered on my machine and indicate which output variable you believe is giving incorrect results and what you expect its results to look like? You could include a graph if that would make it clearer. Then I can attempt to debug it and see if anything is wrong. For instance, maybe the fuel industries are getting revenue changes from something other than fuel prices that ought not to be linked to employment levels. I ought to be able to tell once I can recreate the issue on my machine.

jrissman commented 1 year ago

Also, in the course of investigating item 3 today, I did build this alternate approach for estimating job and EE compensation changes for fuel industries. But I'm not sure if it's more accurate than what we have (I'd need to have a scenario to test with expected outcomes). A potential issue is because it's calculating the changes in employment and EE compensation directly from changes in production, there is no way to divide the changes up into "direct," "indirect," and "induced" bins, which we show on a number of output graphs. Those divisions are just a modeling artifact of I/O modeling, but we consider them important enough to break out in output graphs, so it's better if we can adjust the "Requirements" variables or the revenue figures (rather than avoid using revenue and "Requirements" variables entirely) so we can still flow everything through the direct, indirect, and induced buckets.

AlternateApproach

Here is a .mdl file containing this approach (zipped because GitHub won't let me upload a file with a .mdl extension):

EPS.zip

jrissman commented 1 year ago

Okay, I implemented requested item 2 in commit 4fecba4. This pertains to using endogenously calculated BAU fuel supplier revenues rather than exogenous BAU revenue input data from BObIC. I'm being careful to adjust for the share of the outputs of each ISIC code that are fuels, and I'm maintaining the relationship between BObIC and related input variables like BEbIC and BVAbIC (e.g., labor intensity, value added intensity, etc.) We also needed a new section to total up the BAU energy supplier revenue by fuel, since we previously had no place where we needed that.

@robbieorvis, please test this carefully to make sure the approach resolves the issues you were seeing associated with item 2 and doesn't introduce unexpected problems. We should satisfy ourselves that this feature is done and working as intended before moving on to item 3 (or another issue). The new sections look like this:

Totals

Output

Soon, we should try to stop adding stuff to the 3.4.x series because each time I have to manually re-create the changes in the work-in-progress version of 3.5, and manually recreating everything is tedious and could introduce unintended differences.

robbieorvis commented 1 year ago

Thanks, @jrissman. I will circle up with @oashmoore to test this and also to see about the issue that prompted the updated email anyway. We were seeing some issues in the Rhode Island EPS and the Wyoming (I think) EPS that was causing this, where we had change in jobs that far exceeds the total employment for that sector.

While we are reviewing this, I had another related question. Currently, where we have the fuel mapping to ISIC codes, we put the natural gas sector into the energy pipelines and processing ISIC code, instead of the oil and gas extraction ISIC code. I wonder if that might create issues, because we are likely estimating output/revenue endogenously for the oil and gas extraction ISIC code but not the energy pipelines and gas processing ISIC code. If that's the case, the mismatch could lead to issues in the IO model (and indeed some of the most extreme issues we saw were with the energy pipelines ISIC code.

The DLIM value for that value from the oil and gas extraction sector is high compared to other sectors (around 0.18) but low overall given that we basically expect a very high percentage of consumer spending on natural gas to be for the gas itself.

@oashmoore and I will look into the first two things, but I wanted to ask you about that issue as well while we are on the topic.

jrissman commented 1 year ago

As you will see when you review the new code, "energy pipelines and gas processing" uses endogenous revenue data. Each ISIC code can use exogenous data, endogenous data, or even a mixture of the two, with this balance able to be adjusted via data-only updates to FoPTaFbIC Fractions of Products That are Fuels by ISIC Code. It will be more important than before to get the data in this variable right. It should provide a useful, data-driven way to calibrate BAU revenues for fuel suppliers.

The work completed on item 2 is not intended to address the jobs issue you describe. The goal of item 2 is to align fuel prices, fuel sales quantities, and fuel supplier revenue, rather than taking all three in as input data, so that we do not run into issues where fuel price * fuel sales quantity doesn't equal revenue.

Since we already have a mechanism that adjusts Within Industry Jobs per Unit Output to account for fluctuations in energy prices, I'll need to be able to recreate the jobs issue you describe in order to debug why this code isn't always working. It is worthwhile to see whether the fix for item 2 has any effect on the jobs issue (since it may have some impact, even though it is not intended to address that issue). You can simply drop in the latest ".mdl" file to replace a 3.4.2 model file because it doesn't involve any input data changes. If possible, I would like you to review and confirm the fix for issue 2 (revenue mismatch issue) before we move on to tackle issue 3 (jobs issue), because jobs may depend on revenue, but revenue doesn't depend on jobs, so we're fixing things from upstream to downstream in calculation order.

jrissman commented 1 year ago

One other comment for @robbieorvis and @oashmoore : FtPICM Fuel to Producing ISIC Code Map maps each fuel type to a single ISIC code right now, but there is no requirement the input data be that way. We probably assigned all the revenue for oil and gas to the "energy pipelines and gas processing" industry because they are the ones who actually sell the gas, and they must buy it from the gas extraction companies, so we assumed DLIM would pass through the indirect impacts to the gas extraction firms correctly. But if you find this method understates the impacts on the gas extraction industry and overstates the impacts on the "energy pipelines and gas processing" industry, you can assign fractional shares of natural gas to the gas extraction and to the "energy pipelines and gas processing" industries in FtPICM Fuel to Producing ISIC Code Map (as long as they sum to 1). That would be a data-only update and sounds like it might be relevant for the issue you are describing.

robbieorvis commented 1 year ago

I see the issue with the current structure of Percent Change in Weighted Average Pretax Prices of All Products by ISIC Code and why it isn't working correctly.

This variable is calculated as a change between the policy and the BAU scenario, but we we are really talking about here is a change between the initial year value and subsequent year values, because that is what's driving the problem.

Consider a scenario where the fuel price for natural gas is the same in every sector, but it doubles across all sectors by 2050, even though production/demand is constant (which can happen because of global energy markets). In this case. the model would calculate a value of zero in Percent Change in Weighted Average Pretax Prices of All Products by ISIC Code for the natural gas suppliers and it wouldn't adjust the labor intensity accordingly. That means that if we were to eliminate demand for gas by 2050, the model would double count the job losses because the change in output would reflect a doubling of the fuel price even though the amount of production (btu) per job remained constant.

To fix this, we need to modify the calculation structure for Percent Change in Weighted Average Pretax Prices of All Products by ISIC Code so that is accounts for differences in between the start year and future years. I think we may need to also calculate percentage differences using an INITIAL here and only tracking the policy values. We can't multiply this by the current values unless we change the calculation to be (1 + percent change) or it would nullify the new calculation.

@jrissman we can chat more about this during our check-in today. I want to hold off reviewing the changes to the output variables you updated in case this necessitates further updates.

jrissman commented 1 year ago

The commits today listed above complete the items we discussed today and should resolve item 3 in this issue. As always, there end up being some subtleties when programming it that we didn't think of when discussing beforehand, but I think they are resolved.

Note that there are probably some issues in Oregon data - I noticed that the mismatch between endogenously calculated revenues from coal sales and the values from the I/O input data are pretty large. That's why, since we've moved to endogenous data here, it has scaled up some other things related to coal to a great degree, like the amount households and government spend on coal. You might need to reduce coal use in some of the coal-demanding sectors or change the I/O data to no longer specify that households buy coal if you want this to be more realistic. It's not a structural issue at this point.

I updated FtPICM to allocate natural gas revenues between producers and NG transport, based on U.S. national figures. In the U.S. national model, this better aligns the relative differences in ISIC 06 and ISIC 352T353, so they are both about 36-38% higher than BObIC in the start year. This is to be expected because BObIC's year is a number of years ago and there has been some economic growth since then.

This FtPICM update is undesirable in the Oregon model because there is almost no natural gas extraction in the Oregon input data, so assigning revenue to that industry in FtPICM makes the revenue over 4,000 times higher than BObIC's value in the start year. Therefore, I have left the previous version of FtPICM in place for Oregon.

If you can test this update in the U.S. and Oregon models to see if it addresses your concerns, I'd appreciate it. Then, it would be good to close this issue out. It will take some time and careful attention to copy these changes into 3.5, and I would like to do it sooner rather than later (but not before the fixes have been verified and confirmed by you and Olivia).

I think we should stop making changes to the 3.4.x series after this, because it is too time-consuming and error-prone to have to manually copy changes repeatedly from one model version to another.

oashmoore commented 1 year ago

Hi Jeff,

I tested your changes in KY and MS and they fixed the issue of reducing jobs by greater than 100%! I think we’re totally set.

I can look into the issues in OR and can update that to the most recent version of the state model data. Let me know if there are any other states it would be helpful to test.

Thank you so much! Olivia

robbieorvis commented 1 year ago

Just a quick follow-up that there is indeed a data issue (at least one) in OR. The BAU Fuel Production Imports and Exports has a calculation error for lignite. The Start Year Data tab is pulling in data for short tons of coal imports for lignite, which is a mistake. We need to correct this and also make sure this bug isn't in the main data set for the other state models. @oashmoore please check those - let me know if you want to review the exact issue. There's a similar issue with exports for lignite.

However, there is also a methodological issue that this highlights. In Oregon's case, there is no coal produced but there is coal imported. The current approach for calculating BAU Fuel Industry Revenue by Fuel looks at spending on fuels in end-use sectors, but this means that fuels that are heavily (or wholly) imported have fuel industry revenue associated with them, which is part of what's happening here. While it shouldn't affect employment changes, which are tied to changes in fuel production, it would affect Output and Value added (and GDP).

You could consider a state that only uses coal and imports 100% of the coal. That industry would appear to have a very large coal industry in the IO model, even though it would produce 0 coal. I think we need to fix this by weighting the fuel industry revenues for each fuel by the ratio of production/(production+imports-exports), which will make sure we only capture the revenue that is coming from in-region suppliers. In the Oregon case, this would solve the problem by resulting in a 0 for the fuel industry revenue in the IO model calculations.

jrissman commented 1 year ago

When I was building this feature, I noticed the effect you mention and tried something equivalent to the fix you suggest - multiplying by the domestic content share. Look at the commit history where I made and then reverted the latest change. My memory is that it caused a problem with the energy pipelines and gas processing industry, which would show zero output for a state that imports its gas, when the reason you wanted to base jobs on energy sales rather than production was to properly handle that industry in states that import their gas. But more fundamentally, the I/O model approach is to take in total changes in output due to in-region demand and partition it using the domestic content share later, so reducing the change in in-region demand here by the domestic content share breaks a core assumption of the I/O model and produces wrong results for every industry.

Is the domestic content share for coal set to zero in the OR input data? If not, that's a problem we should fix first.

If it is already set to zero, I can look into it further and see if I can devise a fix that works with our I/O structure. You can check out and test the commit prior to the latest one, containing the reverted change, if you want to see what happens when we multiply by the domestic content share up front.

jrissman commented 1 year ago

Also, for fuels, we could potentially calculate the domestic content share from production, imports, and exports, instead of taking it in as input data, to avoid mismatches, I suppose.

robbieorvis commented 1 year ago

I see. Tricky issues.

FWIW, yes, the value is set to zero for Oregon.

I wonder if we ought to just do the correction for fuel producers. That would omit the natural gas distribution companies.

robbieorvis commented 1 year ago

There's some other strangeness too. Using the method from BLS to develop domestic content shares, we get a value of ~0.3, which is kind of nonsensical. I think this is because the approach uses something called a location quotient, which is the ratio of an area's wages per industry as a share of its total wages to the same value for the US as a whole. In other words, this is saying that Oregon has only about 1/3 the "specialization" in energy pipelines and processing that the US has. But for this sector in the model, I don't think this is an accurate representation of the domestic content share. It could be that, for example, the pipeline system is already built and high quality, and so there's wages spent on the system than other states, comparatively.

This finding is just for the pipeline industry. @oashmoore I think perhaps we should override the value we calculated in the state models to use a value of 1, and can state in the notes we are assuming 100% of the demand for this service is from in-state suppliers. Otherwise, it doesn't really make sense.

For fuel producers, ideally the domestic content share calculated in input data is close to what would be calculated in the model. It would be pretty messy to calculate this endogenously, because we would have to recalculate DLIM too.

I think that fixing a few things here will solve the issues:

1) Data: for the domestic content shares for ISIC 352T353 for states, we should assign a value of 1. 2) Model updates. Reinstitute the same or a similar modification you had made above for fuel producers (i.e. excluding gas processing/transmission). For energy industries that aren't fuel producers (i.e. gas pipelines and processing), we don't use the weighting that we do for fuel producers.

Thoughts?

jrissman commented 1 year ago

This sounds like the right direction. But on your item 2 above, I need to check that this doesn't double-count the discount from the domestic content share. For instance, suppose a region imports 50% of its coal and produces 50% domestically. The impacts of changes in coal demand are already being reduced by 50% due to the domestic content share. Whatever we do upstream has to not double that, so the impact is not reduced to 25%. This is not relevant for a place like Oregon with zero percent domestic content share but can come up in other geographies.

robbieorvis commented 1 year ago

Ah, yes – good point!

jrissman commented 1 year ago

Okay, I looked at this again today and believe I've fixed the issue in commit ed59386. It took a lot of looking and thinking for what ended up being a small code change, in line with Robbie's suggestion. That is, we now limit our endogenously-calculated energy supplier revenue to domestic revenue when calculating BAU output.

This is what I had been getting at in my reverted commit 71698fe, but in that commit, I put the domestic content share in the wrong place. I was thinking along these lines:

There is no theoretical justification for treating fuel producers differently from producers of other sorts of goods when deciding where and how to apply the domestic content share. If a region imports 100% of its steel (for instance, because it doesn't have any blast furnaces or electric arc furnaces), then it shouldn't have valued added by the non-existent domestic steel industry. So I should develop a generalized fix rather than only address energy supplier industries.

What I didn't remember is that the input data in variables such as BObIC, BEbIC, BVAbIC, etc. already include only domestic output, domestic employment, domestic value added, etc., so I don't need to touch any of those to limit them to in-region suppliers. Our move to use endogenously calculated spending on fuel (i.e., revenue for the fuel industry) when calculating BAU output was not limited to domestic fuel suppliers prior to commit ed59386. So, the domestic content share needed to be applied only to those fuel industries, not to all industries. This is exactly as Robbie had suggested above, so I want to give credit to Robbie for identifying and suggesting the correct fix.

jrissman commented 1 year ago

There are still data-only issues you might want to address in the Oregon model, such as the lignite data source issue Robbie identified, and possibly setting the domestic content share for the energy pipelines industry (ISIC 352T353) to 1.

Maybe someone could do these data updates, commit them to the oregon repo, and then someone can test to see if the latest structural commit plus these data fixes resolve outstanding issues in issue 236, so we can close this issue (and I'll manually port the changes into 3.5).

jrissman commented 1 year ago

Also, for completeness, regarding this comment from back on July 22, higher up in this thread:

Robbie and I chatted about (1) above briefly before Jeff rejoined our group call. His concern is that BAU Value Added is not scaled by GDP before it flows through the IO model and is used to calculate the change in Value Added. Ideally, we'd align BGDP and BAU Value Added upstream of that so that we increase accuracy.

The BAU Value Added needs to be at the same scale (meaning, using the same underlying data source and assumptions) as BAU Output, BAU Employment, and BAU Employee Compensation because it is used to form intensity ratios (jobs per unit output, value added per unit output, etc.). We don't want to rescale value added to match some external GDP projection prior to forming these intensity ratios, because we want the intensity ratios to remain as they are in our data source. It is these intensity ratios, not the absolute magnitude of BAU Value Added, that is used to calculate the Change in Value Added due to the policy package. So we don't need an accurate BAU Value Added figure until after we've already established the intensity ratios and found the change in value added due to the policy package. That's how we do it today, rescaling later in the I/O model, rather than upfront. Therefore, I don't think we need to make any changes regarding where in the calculation flow BAU Value Added is rescaled to match our external GDP projection.

jrissman commented 1 year ago

Closing this following testing and confirmation with Robbie by email