Replace Excel-based FoPITY system with Python-based solution

jrissman commented 2 years ago

Currently, it is annoying to add new policies (or new subscripts for an existing policy) to the policy schedules (FoPITY) for two reasons:

Within each FoPITY Excel file, after adding a policy to the "Set Schedules Here" tab, careful manipulation is required to update the blue formula tabs to reflect changes in the green "Set Schedules Here" tab.
There are multiple files that need to be updated with the new policy. It should only be necessary to add the policy in one place, then to customize the schedule data in up to nine different places (for schedules 1 through 9).

Issue 1 can be solved by getting rid of the blue output tabs. Instead, export the green "Set Schedules Here" tab to .csv, and use a Python script to read the exported "Set Schedules Here" data and build the corresponding .csv output files. The script could trivially copy the files for the first (default) schedule to fill in any unused schedule slots (i.e. up through FoPITY-9). Note that the format of the "Set Schedules Here" tab would change to make it easier for the Python script to read/process. In fact, it would probably be replaced with a text-based input file, entirely removing Excel from the process.

Issue 2 can be dealt with by having one input file set all 9 policy schedules. The easiest way to do this is probably to change the input format so that each policy gets a set of ordered pairs for each policy schedule, and there is a delimiter character between schedules. For example, here is a row that would set two different schedules for a single policy element:

Policy Name|Subscript1|Subscript2|(2020,0)|(2030,1)|(2050,1)/(2020,0)|(2030,0.5)|(2040,1)|(2050,1)

In this example, I'm using the pipe character (|) as the item delimiter, to distinguish it from the comma delimiter used within the ordered pairs. The character / is the delimiter between policy schedules. There could be up to 9 policy schedules listed per row, though only one schedule is required. The Python script would simply re-use the first defined schedule for any schedules that are undefined, up through schedule 9.

After this update, adding a new policy would simply involve opening one text file, adding one row for each subscript element of the new policy, and setting the schedule data all on that same row. It would be much more compact (rather than a spreadsheet with many columns) because schedule data would be entered as ordered pairs. There would be no more dealing with multiple files, no more copying and pasting anything, no repetitious definitions of policy and subscript names, and no more updates to formulas on blue output tabs. In fact, the Python script itself could be the place where the policy schedule data is set. And since it's all text-based, it would work better with Git and change tracking.

This issue added based on feedback from @robbieorvis on 11/17/2021.

jrissman commented 2 years ago

@ToddFincannonEI will be handling this item. Thank you, Todd.

A few points I've thought about but didn't write above, just in case they are useful.

If the policy schedule rows are entered as Python data (rather than each row being one long string that Python parses), then text editors will color the elements differently, which might help users entering schedule data to avoid typos, like a missing comma. This would involve using different delimiters than in my example above. One possible example (using # as the delimiter between schedules):

ColoredTextExample

The .csv output files read by Vensim, such as FoPITY-1.csv1, should always go to 2100, irrespective of the end date of the model run. This is necessary to keep the end date of the model run fully data-driven. The value for the latest specified year (typically 2050) should simply be repeated for every year after that up through 2100. Note that this only applies to the Vensim version of the .csv output file. The output .csv file read by the web app, such as FoPITY-1-WebApp.csv, contains ordered pairs and should not specify any dates later than the last date the user entered in their desired policy schedules (which should be the same as the model run end date).
The output file FoPITY-policy-elements.csv is the same for all 9 policy schedules and therefore only has to be generated once.

jrissman commented 2 years ago

I went ahead and completed this item today, as I wanted to have the new policy scheduling system in place before making any more changes to policies. I've tested the new Python-based system, and I think it's an improvement. Here are some key differences:

All reference scenario schedules (up to nine) are now handled in the same script.
There is only one place to add policies or to update policies' subscripts, so it will be much faster and less error-prone to do model development steps that involve adding or changing policies
Policy schedules are set in text, avoiding the complicated and slow-running Excel formulas that were so easy to mess up (for example, if users used cut-and-paste when setting schedules). Users are less likely to introduce bugs now.
All .csv schedule files are generated in under one second by double-clicking on the Python script. It is much faster than exporting blue tabs to .csv files, especially given the long loading time for the large Excel files.

Rather than using a delimiter between policy schedules, I set up the file so that each schedule appears on its own line. Beneath each policy name, the first schedule is schedule 1 ("Default" for the U.S.), the second one is schedule 2 ("NDC" for the U.S.), etc. This will help avoid mistakes where users might accidentally edit a schedule they didn't intend to edit, if they had all been on the same line. Here's a screenshot (from Notepad++, with Python text coloring):

NewPolicyScheduleFormat

You probably all already have one, but I recommend using a programming-oriented text editor (like Notepad++ or Visual Studio Code) when editing Python files, so you get the benefit of the text coloring. It can really help avoid bugs/syntax errors.

Note that I've imported the existing schedules from Excel exactly as they were, making no changes. I've verified the schedules are the same using a diff tool.

There are some oddities in the NDC schedule (schedule 2). For instance, in the very first policy in the screenshot above, the point (2024,0) is doing nothing because it is surrounded by points that also have the value zero: (2020,0) and (2025,0).

I removed the three large Excel files, as they no longer have any purpose. However, I made an Excel file called Supporting Calculations for Policy Schedules.xlsx that retains the parts of the old Excel files where we did calculations to determine what implementation percentages to set in the old "Set Schedules Here" tab. There were only two instances of this: calculating COVID recession impacts and calculating the S-curve for implementation of the R&D policies. If you wish to do so, you can add new tabs to this file to house any supporting calculations you make to determine other schedule percentages.

robbieorvis commented 2 years ago

Thanks, Jeff. Can you walk us through how to do this on this week’s EPS team call?

Robbie Orvis Senior Director of Energy Policy Design +1 415-799-2171 1225 Eye Street, N.W. Suite 904 Washington, D.C. 20005 www.energyinnovation.orghttp://www.energyinnovation.org/ @.***

From: Jeff Rissman @.> Sent: Tuesday, December 14, 2021 1:34 AM To: Energy-Innovation/eps-us @.> Cc: Robbie Orvis @.>; Mention @.> Subject: Re: [Energy-Innovation/eps-us] Replace Excel-based FoPITY system with Python-based solution (Issue #204)

I went ahead and completed this item today, as I wanted to have the new policy scheduling system in place before making any more changes to policies. I've tested the new Python-based system, and I think it's an improvement. Here are some key differences:

All reference scenario schedules (up to nine) are now handled in the same script.
There is only one place to add policies or to update policies' subscripts, so it will be much faster and less error-prone to do model development steps that involve adding or changing policies
Policy schedules are set in text, avoiding the complicated and slow-running Excel formulas that were so easy to mess up (for example, if users used cut-and-paste when setting schedules). Users are less likely to introduce bugs now.
All .csv schedule files are generated in under one second by double-clicking on the Python script. It is much faster than exporting blue tabs to .csv files, especially given the long loading time for the large Excel files.

Rather than using a delimiter between policy schedules, I set up the file so that each schedule appears on its own line. Beneath each policy name, the first schedule is schedule 1 ("Default" for the U.S.), the second one is schedule 2 ("NDC" for the U.S.), etc. This will help avoid mistakes where users might accidentally edit a schedule they didn't intend to edit, if they had all been on the same line. Here's a screenshot (from Notepad++, with Python text coloring):

[NewPolicyScheduleFormat]https://user-images.githubusercontent.com/7120106/145944159-7949d80f-67c6-431a-95e9-529932541efa.png

You probably all already have one, but I recommend using a programming-oriented text editor (like Notepad++ or Visual Studio Code) when editing Python files, so you get the benefit of the text coloring. It can really help avoid bugs/syntax errors.

Note that I've imported the existing schedules from Excel exactly as they were, making no changes. I've verified the schedules are the same using a diff tool.

There are some oddities in the NDC schedule (schedule 2). For instance, in the very first policy in the screenshot above, the point (2024,0) is doing nothing because it is surrounded by points that also have the value zero: (2020,0) and (2025,0).

I removed the three large Excel files, as they no longer have any purpose. However, I made an Excel file called Supporting Calculations for Policy Schedules.xlsx that retains the parts of the old Excel files where we did calculations to determine what implementation percentages to set in the old "Set Schedules Here" tab. There were only two instances of this: calculating COVID recession impacts and calculating the S-curve for implementation of the R&D policies. If you wish to do so, you can add new tabs to this file to house any supporting calculations you make to determine other schedule percentages.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Energy-Innovation/eps-us/issues/204#issuecomment-993204083, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AK5N6SNNHRQA6ZUC4WOMLATUQ3QMTANCNFSM5IIDGZXQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jrissman commented 2 years ago

Of course, I would be happy to.

EnergyInnovation / eps-us

Replace Excel-based FoPITY system with Python-based solution #204