Discussion of Wedge Diagram / Cost Curve Edge Case Situation

jrissman commented 2 years ago

The procedure for calculating the contribution of policies to total abatement (or other metrics) was overhauled and improved as described in issue #119. As noted in that issue, there remains a situation where the results are not very good: policies within the same sector that have heavily overlapping effects. In short, the recommended approach for dealing with this is to assign such policies to the same policy group in WebAppData, so they are enabled and disabled together when constructing wedge diagrams and cost curves.

We are not, at this time, planning on changing the way the wedge diagram or cost curve calculations are done. But I've been asked to log the following conversation (which happened over email) here as a GitHub issue so we can refer back to it the next time a partner runs into this situation.

jrissman commented 2 years ago

On Feb 10, 2022, Ashana from RMI wrote:

This is Ashna from the Analysis team at RMI. We've been working with Michigan advocates, who put together a scenario and were wondering why CCS was such a major factor, and not the CES. See figure below, where CCS is set to 1% for petroleum and natural gas peakers:

We tried removing the CCS lever to see if the attribution would return to the CES - but instead there is a total drop-off, and no policy gets credit for emissions reductions:

We created a new scenario and added the Clean Electricity Standard and CCS individually and together (without additional policies) and were not able to reproduce this error. We think this might be a bug.

Please let us know if any additional information would be helpful here.

Thanks, -Ashna

jrissman commented 2 years ago

On Feb 12, 2022, Todd wrote:

I’m still not sure what to make of this. Perhaps you can offer some thoughts about what’s going on. The function that calculates the contribution of each policy group shows that all contributions are zero after 2041. That’s where the blank space starts in the wedge chart. The largest contribution on the chart is from the Clean Electricity Standard policy group. In this listing, “pg” is the emissions from the policy group, and “cs” is the emissions from the current scenario (“Our Plan”). As you can see, the difference dwindles to zero after 2041 — that is, there is no contribution from the policy group to the emissions abatement. This is also true for every other policy group after 2041.

The policy group emissions are the emissions from a policy group scenario that includes all policies except the ones in the policy group. So we are seeing that there is no difference between the “Our Plan” emissions and the emissions without the Clean Electricity Standard policy.

===== Clean Electricity Standard 2020 pg = 42.94, cs = 42.94, contribution = 0 2021 pg = 45.24, cs = 45.24, contribution = 0 2022 pg = 42.06, cs = 36.57, contribution = 5.489 2023 pg = 43.69, cs = 34.38, contribution = 9.301 2024 pg = 44.61, cs = 33.07, contribution = 11.54 2025 pg = 37.19, cs = 29.89, contribution = 7.303 2026 pg = 34.09, cs = 27.03, contribution = 7.065 2027 pg = 32.69, cs = 24.58, contribution = 8.11 2028 pg = 32.53, cs = 22.51, contribution = 10.01 2029 pg = 29.17, cs = 19.45, contribution = 9.717 2030 pg = 17.16, cs = 12.94, contribution = 4.223 2031 pg = 16.23, cs = 10.97, contribution = 5.262 2032 pg = 15.68, cs = 9.05, contribution = 6.631 2033 pg = 15.73, cs = 7.785, contribution = 7.944 2034 pg = 15.00, cs = 6.069, contribution = 8.935 2035 pg = 14.03, cs = 6.068, contribution = 7.965 2036 pg = 12.51, cs = 6.068, contribution = 6.44 2037 pg = 11.22, cs = 6.068, contribution = 5.148 2038 pg = 9.865, cs = 6.068, contribution = 3.797 2039 pg = 8.339, cs = 6.068, contribution = 2.271 2040 pg = 6.596, cs = 6.068, contribution = 0.5274 2041 pg = 6.068, cs = 6.068, contribution = 0.00026 2042 pg = 6.068, cs = 6.068, contribution = 0 2043 pg = 6.068, cs = 6.068, contribution = 0 2044 pg = 6.068, cs = 6.068, contribution = 0 2045 pg = 6.068, cs = 6.068, contribution = 0 2046 pg = 6.068, cs = 6.068, contribution = 0 2047 pg = 6.068, cs = 6.068, contribution = 0 2048 pg = 6.068, cs = 6.068, contribution = 0 2049 pg = 6.068, cs = 6.068, contribution = 0 2050 pg = 6.068, cs = 6.068, contribution = 0

The chart seems to be correct, but we’ve never seen a case where there’s a gap like this.

jrissman commented 2 years ago

On Feb 14, 2022, Robbie wrote:

Thanks for continuing to look into this! I’m pulling Jeff in too.

I think I can see what is causing an issue, but not sure how to debug it.

The issue I think is that after 2041, if you turn off the clean electricity standard, the combination of power plant bans + retirements results in exactly the same amount of emissions in 2041 and beyond that you get with the CES enabled. This is because you are effectively achieving the same outcome in the model by forced retirement of power plants in either case and by banning new power plants that aren’t covered by the CES. The inverse is also true – when you turn off the power plant retirements in particular, the difference is 100% covered by the CES.

If I turn off the CES and re-run the wedge diagram, all looks normal with 100% of the emissions reductions in 2041 and beyond being attributed to the power plant retirements.

If I had to a venture a guess, I think the issue here has to do with the fact that the contribution of either policy (when turned off) is zero in 2041 and beyond and when calculating the change from a policy being disabled, it is zero, because the other policies 100% make up for the loss of it. If you have multiple policies where that happens, it could cause them all to be zero.

Where this is the case, i.e. policy groups are 100% overlapping, we might just have to make a 50/50 split or come up with a workaround.

Jeff, would love to get your thoughts here!

jrissman commented 2 years ago

On Feb 14, 2022, I wrote:

Since policies effects are tested in the context of the entire policy package (by disabling that policy and seeing how much emissions rise), if two policies are completely overlapping in their effects, then there will be no increase in emissions when either one of them is disabled individually. There is no “true answer” for which of these policies is ultimately “responsible” for the emissions reductions, as they both are sufficient to cause 100% of the observed reductions.

We used to run into cases like this more often. We improved the procedure for calculating wedge diagrams and cost curves to effectively test each sector separately, then ensure each sector’s total emissions abatement matches the abatement from that sector’s metric. (We used to just test all policies across all sectors without worrying about which sector they are in.) Even at the time we implemented the fix, we knew it would not cover 100% of all cases, but we felt it fixed probably 90% of cases in practice where it comes up, because now the diagrams can handle overlapping policies that are in different sectors. It only fails when the 100% overlapping policies are both within the same sector. That is the case here (they are both electricity sector policies), which is why you’re seeing the issue.

The simplest and quickest way to resolve the issue is to group the policies in WebAppData. You do this on the “Policy Characteristics” tab in column D, which has the header “Policy Group.” This will cause the web app to test those two policies together (turn them off/on together), so you should probably get correct results. There is no “real” answer to how the emissions should be attributed between the policies in this case anyway, so in a sense, grouping them is the most accurate way to convey this concept.

These policies would then be grouped for all users and all scenarios, so if there are other instances when you want to use both of these policies and they are not 100% overlapping, you won’t get any attribution between them. But when they are heavily overlapping and within the same sector, the attribution between them isn’t very good even when the overlap is short of 100%, so it honestly might be better to group them even from the perspective of these other cases. That is, suppose the overlap is 90% rather than 100% in a given scenario. It will tend to attribute all or almost all the reductions to just one of the policies, when the other policy is actually strong but mostly overlapping. I think that’s probably what is going on in years prior to 2041 in the screenshots below, where the attribution of everything going to CES and none to CCS looks pretty questionable.

You can choose any “Policy Group” name that you like, as long as the same name is used for both policies in the “Policy Characteristics” tab of WebAppData. Something like “CES + Elec Sector CCS” would be my choice for a group name for these two policies.

That’s the quick and easy solution. It might be the best for this case, and for most cases where this still crops up after the earlier formula improvements.

Once you do that, check the wedge diagram for “Total” (rather than for “Electricity Sector”) and make sure the electricity sector wedges no longer go to zero thickness in 2042, as they do in this screenshot:

A fix that involves redesigning the wedge thickness calculation formula is tricky, because we are essentially balancing two competing priorities: understanding what each policy does in the context of the other policies (which requires the others to be enabled) and figuring out what each policy does on its own when there is policy effect overlap, which is essentially pushing in the direction of testing each policy individually with others disabled. We try to strike an elegant balance using the current procedure, but it still involves trade-offs. The formula is currently not looking at the actual results and altering the procedure based on those results. I think in order to improve the formula further, it would be necessary to have the app do runs that test for policy overlap, then handle the display of policies differently depending on whether overlap is detected or not. This would be a significant departure from the methods we use on all graphs today, where every graph is defined by a specific formula (whether specified in Vensim or in the web app) and the way the metric is calculated does not vary depending on what the specific results are. The method would need to smoothly handle cases of overlap from 0% to 100%, not be a special case for 100% overlap.

It is not immediately obvious to me how the web app, which only has access to the final output for specific variables used in graphs, would distinguish between the case where a policy is weak (does not increase emissions much when turned off) because the policy is always/usually a weak one, or because the policy is strong but has a 90% overlap with another policy. If we could detect the degree of overlap, which should be reported as a value between 0% and 100%, we could do individual runs for each overlapping policy (i.e. with only that policy enabled) and then divide the overlapping portion into segments based on the size of abatement caused by each policy in the individual policy runs. This is an improvement on Robbie’s idea to flatly assign 50% abatement to each of the two overlapping policies (in a case of 100% overlap), since instead of always choosing 50%, we’d be using shares based on the performance of these two policies individually. So we might get, say, 60%/40% or the like, which correctly indicates which of the two policies is “stronger” when used on its own.

The non-overlapping portion is attributed to each policy as it is done today
The overlapping portion is attributed in shares by the relative strength of each policy tested individually

Calculating the degree of overlap is the tricky part. But one nice thing is that it only matters within a single sector now, so we don’t have to worry about cross-sector overlap. If we only want to detect when two policies overlap, we could have the web app silently group every two policies and test each such combination, and when it finds that the abatement of a group of two policies is X and the abatement of each one of the policies within the group is Y and Z (tested under the current procedure that disables each policy in turn and checks for emissions increases), then 1 – ((Y + Z) / X) is the degree of overlap. We would get negative overlap values for mutually reinforcing policies, which comes up frequently, and which the web app already handles well today. I guess we ignore the negative values and only use the special procedure for positive overlap values.

This doesn’t really address the case where some third policy partially overlaps with each of the two we just tested above, with a different degree of overlap between the third policy and each of the other two. To take the simplest case, say we calculate the pair-wise overlaps for every combination of two policies in a set of three policies. We end up with three distinct overlap values, one for each pair. That seems like a mess.

If you test all three together, you get something like 1 – ((Y + Z + Q) / X) as the degree of overlap. That would fail in most circumstances when the three policies are not all heavily overlapping with each other. But maybe all those failures would produce negative values, and thus be ignored? And if we do get a positive overlap value, the key question is what do we do with the pair-wise overlap values when we also have a different overlap value for the set of three? And then how do sets four interact with the sets of two and sets of three? We’re at the point where we need to have a running version and start performing many tests as we tweak the algorithm if we want to work out such details.

One thing to keep in mind is that this will have a big impact on graph generation time. Currently, we do one run for every policy within a package to generate a wedge diagram or cost curve. Under the new approach, you’d need one run for every combination of policies (per sector, thankfully) rather than one per policy. Even with the per-sector limitation, this can get out of hand quickly. For example, in a sector with eight policies, today we handle this with eight runs. But if you wanted to test every combination, that’s 2^8 or 256 runs. In the current web app, each run might take about 1 second, so let’s say 8 seconds for the current approach and 256 seconds (4.3 minutes) for the more detailed approach. Nobody is going to sit there for 4.3 minutes while the web app calculates a wedge diagram. Any user would be convinced the web app froze.

Our new web app framework under development will accelerate runs by 10x on typical devices, but even 25.6 seconds (instead of 256 seconds) is probably beyond users’ patience. (And users might enable more than 8 policies in a particular sector sometimes. For example, the electricity sector in the Michigan EPS contains 14 policy levers, which could generate 2^14 runs, or 16,384 runs. Even at 0.1 seconds per run, a 10x speed boost relative to the current app, that’s over 27 minutes.)

Any revision of the formula will create edge cases, so whatever we do would need to be tested very carefully and iterated to develop a final design. The final approach would be significantly more complicated than the current approach and would significantly increase the time required to display a wedge diagram or cost curve. It’s not something we should consider at least until we have the new app framework finalized that gives us the 10x model runtime speed boost. Even then, it may not be worth (1) the slowdown in display of wedge diagrams and cost curves, which would be a significant negative impact on many users in many situations, and (2) the large amount of engineering development time it would demand, given that there are many high-value capabilities that might be of more value to more users in more circumstances. We must make certain optimizations and simplifications in order to have a model that runs so fast that it can be used interactively, and this might well be in that category.

But in case we do decide to do something on this, the thoughts above are the start of a roadmap. I can memorialize it as a GitHub issue in WebAppData in case we want to return to this question. Of course, there may be better solutions that I’m not thinking of.

In the short term, grouping those policies in WebAppData should fix the issue. This might be the recommended long-term fix as well. But I’m open to hearing others’ thoughts on that.

jrissman commented 2 years ago

On Feb 14, 2022, I wrote:

One other wrinkle to remember is that most policies affect more than one sector. This is handled by splitting policies’ effects into their effects on each sector, then integrating later, per steps 9-11 here. This might be incompatible with the ideas detailed below regarding multiple model runs, because essentially every policy has a calculated impact on every sector (even if that impact is zero in some cases), so we can’t limit the testing of interactive policies to those that happen to be classified as being within a given “sector” in the policy tree menu of the web application, and everything might need to be tested in terms of “policy fragments” where each fragment is that policy’s effect on a given sector.

I don’t think it would be easy to devise a generalized procedure better than the one we have, which is the result of considerable refinement and experience.

EnergyInnovation / eps-us

Discussion of Wedge Diagram / Cost Curve Edge Case Situation #228