Process output data - Githubissues

rouille commented 3 years ago

Below are the steps that we need to follow to use our analysis/plotting ecosystem with results generated by switc:

[x] Read output pickle file (see BreakthroughEnergySciences/Project/Expansion Planning Optimization Model/Temp Data)
[x] Handle timepoints to timestamps conversion (#46)
[x] Get time series of results (PG, PF, ...) (#50)
- [x] PF
- [x] PG
- [x] duals
- [x] storage dispatch
[x] Build a Grid object with information stored in the output file and potentially input data (#49)
[x] Instantiate MockScenario and MockGrid objects (#51)

danielolsen commented 3 years ago

@YifanLi86 how are we specifying the transmission expansion candidates? I see that we're specifying the transmission_lines.csv file; does this specify the existing the transmission capacity, the transmission expansion candidates, or both? This will relate to #49 for sure, maybe #50 as well.

YifanLi86 commented 3 years ago

@YifanLi86 how are we specifying the transmission expansion candidates? I see that we're specifying the transmission_lines.csv file; does this specify the existing the transmission capacity, the transmission expansion candidates, or both? This will relate to #49 for sure, maybe #50 as well.

It did both the existing transmission capacity and the transmission expansion candidates.

danielolsen commented 3 years ago

@YifanLi86 how are we specifying the transmission expansion candidates? I see that we're specifying the transmission_lines.csv file; does this specify the existing the transmission capacity, the transmission expansion candidates, or both? This will relate to #49 for sure, maybe #50 as well.

It did both the existing transmission capacity and the transmission expansion candidates.

So the set of candidates is limited to the topology of the existing branches? Do we place any limits on how much these branches can be expanded?

YifanLi86 commented 3 years ago

@YifanLi86 how are we specifying the transmission expansion candidates? I see that we're specifying the transmission_lines.csv file; does this specify the existing the transmission capacity, the transmission expansion candidates, or both? This will relate to #49 for sure, maybe #50 as well.

It did both the existing transmission capacity and the transmission expansion candidates.

So the set of candidates is limited to the topology of the existing branches? Do we place any limits on how much these branches can be expanded?

Currently we only model existing routes as candidates, we can extend it to new routes but since it is only transport model it does not make too much sense to do that. There is no limit on how much can be expanded.

danielolsen commented 3 years ago

If we wanted to have a different set of new routes, distinct from the existing routes, and potentially with limited expansion, do you know if there is another file or column which is used to pass that information to Switch? If not, it seems like we could enable that by adding them into the transmission_lines.csv file with zero existing capacity, which would identify them as candidates, but wouldn't constrain them it seems.

YifanLi86 commented 3 years ago

If we wanted to have a different set of new routes, distinct from the existing routes, and potentially with limited expansion, do you know if there is another file or column which is used to pass that information to Switch? If not, it seems like we could enable that by adding them into the transmission_lines.csv file with zero existing capacity, which would identify them as candidates, but wouldn't constrain them it seems.

"If not, it seems like we could enable that by adding them into the transmission_lines.csv file with zero existing capacity," yes that is the only way to do it. No other files or columns.

danielolsen commented 3 years ago

The documentation says:

Switch is allowed to expand available transfer capability (capacity) along each corridor, subject to limits set by the user

but Table 9 below this paragraph does not list any parameters that could be used to limit the transmission capacity of a given line. https://ars.els-cdn.com/content/image/1-s2.0-S2352711018301547-mmc1.pdf, Section 4.3.10, numbered page 22

Maybe this is something to ask about via their google group: https://groups.google.com/g/switch-model

danielolsen commented 3 years ago

When we call Switch with the --sufixes dual flag (the default since #84), we get a non-empty dictionary from results.solution._list[0]["Constraint"]. Using re, we can see the pattern of keys in this dictionary:

>>> import pickle
>>> import re
>>> with open("results.pickle", rb") as f:
...     results=pickle.load(f)
>>> prog = re.compile(r"([A-z]+)\[")
>>> {prog.match(i).group(1) for i in sorted(results.solution._list[0]["Constraint"].keys())}
>>> sorted({prog.match(i).group(1) for i in sorted(results.solution._list[0]["Constraint"].keys())})
['Distributed_Energy_Balance', 'Enforce_Dispatch_Baseload_Flat', 'Enforce_Dispatch_Upper_Limit', 'Enforce_Local_TD_Capacity_Limit', 'GenFuelUseRate_Calculate', 'Max_Build_Potential', 'Maximum_DispatchTx', 'Zone_Energy_Balance']

Maximum_Dispatch_Tx is probably the shadow price on whether the branch is binding in either direction. Full keys follow the pattern "Zone_Energy_Balance[BUS_ID_1,BUS_ID_2,TIMEPOINT]". Looking at the values for these constraints (e.g. results.solution._list[0]["Constraint"]["Maximum_DispatchTx[2053305,2060801,4]"]["Dual"]), all values appear to be non-positive (subject to very small floating point cruft, probably from the Barrier method), so we will probably need to determine direction either by the order of the bus IDs (which do not appear to be sorted) or by the positivity/negativity of the DispatchTx values
Zone_Energy_Balance is probably the LMPs for each zone, where currently each of our original buses is its own zone. Full keys follow the pattern "Zone_Energy_Balance[BUS_ID,TIMEPOINT]". Values are non-negative, which I think makes sense given the underlying transport model.

BainanXia commented 3 years ago

When we call Switch with the --sufixes dual flag (the default since #84), we get a non-empty dictionary from results.solution._list[0]["Constraint"]. Using re, we can see the pattern of keys in this dictionary:
>>> import pickle
>>> import re
>>> with open("results.pickle", rb") as f:
...     results=pickle.load(f)
>>> prog = re.compile(r"([A-z]+)\[")
>>> {prog.match(i).group(1) for i in sorted(results.solution._list[0]["Constraint"].keys())}
>>> sorted({prog.match(i).group(1) for i in sorted(results.solution._list[0]["Constraint"].keys())})
['Distributed_Energy_Balance', 'Enforce_Dispatch_Baseload_Flat', 'Enforce_Dispatch_Upper_Limit', 'Enforce_Local_TD_Capacity_Limit', 'GenFuelUseRate_Calculate', 'Max_Build_Potential', 'Maximum_DispatchTx', 'Zone_Energy_Balance']
Maximum_Dispatch_Tx is probably the shadow price on whether the branch is binding in either direction. Full keys follow the pattern "Zone_Energy_Balance[BUS_ID_1,BUS_ID_2,TIMEPOINT]". Looking at the values for these constraints (e.g. results.solution._list[0]["Constraint"]["Maximum_DispatchTx[2053305,2060801,4]"]["Dual"]), all values appear to be non-positive (subject to very small floating point cruft, probably from the Barrier method), so we will probably need to determine direction either by the order of the bus IDs (which do not appear to be sorted) or by the positivity/negativity of the DispatchTx values

Zone_Energy_Balance is probably the LMPs for each zone, where currently each of our original buses is its own zone. Full keys follow the pattern "Zone_Energy_Balance[BUS_ID,TIMEPOINT]". Values are non-negative, which I think makes sense given the underlying transport model.

@YifanLi86 I'm actively working on extracting duals from Switch results, i.e. lmp, congu, congl (as we did for PCM results). Thanks to @danielolsen 's exploration, I've extracted "Zone_Energy_Balance" and hope to get time series lmps from it. Here is the data frame. As you could see the values are very large: Comparing with the PCM LMPs from Scenario 599:

Discussed with Daniel and there are two possibilities:

These are LMPs in $/kwh, so dividing the entire data frame by 1e3 would give us reasonable values.
These are not LMPs and the actual LMPs are stored elsewhere.

Just want to check with you @YifanLi86 to see whether you could confirm this before I google it.

YifanLi86 commented 3 years ago

@BainanXia I think the unit in load file is in MW, and all gens cost are modeled in $ (heat rate being multiplied by fuel cost as well which is also in $). So it should already be $/MW. Regarding why the number is so big, so these are the original number from SWITCH solution after temporal reduction, and you duplicate the values based on the slicing recovery file right? If so, for each time point, it should be divided by the weight of that time point, i.e., the total hours in that time point defined in the slicing, so as to recover to LMP $/MW value by getting rid of the total number of hours "energy" there.

YifanLi86 commented 3 years ago

Here is the weight of the 24 points: 1 876 2 876 3 876 4 876 5 546 6 546 7 546 8 546 9 180 10 180 11 180 12 180 13 42 14 42 15 42 16 42 17 510 18 510 19 510 20 510 21 42 22 42 23 42 24 42 From your data example, hour 4 & 5 have significantly smaller value, and 4 & 5 mapped to time point 21 & 22 with weight being 42, while hour 1,2,3 mapped to time point 1 which weighs 876. Dividing by these weights you get around 36 and 40 respectively. Might worth double check an example column of value to confirm all 8784 values are in good shape.

danielolsen commented 3 years ago

@YifanLi86 I think you're right.

For the timestamp mapping I've been using to generate test data (I think these are the same as those that @BainanXia is using), the 2030_all timeseries, which applies to all the timestamps in @BainanXia's screenshot except for the 3am and 4am hours, has ts_duration_of_tp = 6 and ts_scale_to_period = 146 in the generated timeseries.csv file. If we divide the Zone_Energy_Balance value by 6 * 146, we get an LMP of about $37/MWh. Similarly, the 3am and 4am hours apply to the 2030_winpk timeseries, which has ts_duration_of_tp = 6 and ts_scale_to_period = 7. Dividing these Zone_Energy_Balance values by 6 * 7 gives LMPs around $41/MWh.

BainanXia commented 3 years ago

@YifanLi86 Good point and it seems to be reasonable as @danielolsen explained. However, I have a more general question regarding this: currently we recover the time series outputs simply by duplicating the values of a time point among time stamps according to slicing recovery file (except for PF, we also split power between parallel lines given Switch results are based on bus tuples). I'm wondering do all dual variables (congu, congl, lmp) need to recover in the way you proposed, i.e. values need to be divided by the weight of corresponding time points, or only lmp. How about primal variables, pg, pf, storage_pg, storage_e?

YifanLi86 commented 3 years ago

@YifanLi86 Good point and it seems to be reasonable as @danielolsen explained. However, I have a more general question regarding this: currently we recover the time series outputs simply by duplicating the values of a time point among time stamps according to slicing recovery file (except for PF, we also split power between parallel lines given Switch results are based on bus tuples). I'm wondering do all dual variables (congu, congl, lmp) need to recover in the way you proposed, i.e. values need to be divided by the weight of corresponding time points, or only lmp. How about primal variables, pg, pf, storage_pg, storage_e?

Good point. Yes I believe all duals need to be divided, but not pg pf storage_pg, storage_e. Since those are pure MW value and limited by constraints defined in MW. Anything related to cost or energy should be divided, but anything just a dispatch of MW for snapshots should not.

BainanXia commented 3 years ago

Per @danielolsen 's comment, I'm implementing get_congu and get_congl in SwitchExtract class.

Our plan based on previous discussion:

Maximum_DispatchTx gives the shadow prices in terms of bus tuples. There will be parallel lines in only one direction between a bus tuple, i.e. all lines between bus_A and bus_B should have same from bus and to bus. Hence, suppose we have two lines with (from_A, to_B) with dual values, D1 for bus tuple (A, B) and D2 for (B, A) respectively. Then D1 would be congu and D2 would be congl (at most one of D1 and D2 should be non-zero and positive for a given time point). Finally, we apply such dual values to all parallel lines equally, (parallel lines should bind simultaneously and shadow prices are the same), i.e. both lines are going to have congu = D1 and congl = D2 for the corresponding timepoint.

Here is my observation during implementation: md is the data frame returned from parse_timepoints with appropriate parameters passed in, columns are bus tuples, indexed by timestamps. md_mirror is the data frame copied from md with all bus tuples reversed, (A, B) -> (B, A). We get an all zero data frame by element wise multiplications between md and md_mirror, which verifies the assumption "at most one between congu and congl is non zero for a given time point. However, I'm also expecting all of the values in md should be non-negative, which is not the case, it turns out that almost all values are negative with some very small positive ones. Any thoughts? @danielolsen @YifanLi86

danielolsen commented 3 years ago

The sign on the Maximum_DispatchTx duals is consistent with my earlier investigation: https://github.com/Breakthrough-Energy/SwitchWrapper/issues/47#issuecomment-853289764. I believe this behavior is an artifact of how the problem was set up and/or interpreted via Pyomo, since a negative dual for an inequality constraint doesn't make sense in classical optimization formulations, so we can probably just multiply by -1.

BainanXia commented 3 years ago

The sign on the Maximum_DispatchTx duals is consistent with my earlier investigation: #47 (comment). I believe this behavior is an artifact of how the problem was set up and/or interpreted via Pyomo, since a negative dual for an inequality constraint doesn't make sense in classical optimization formulations, so we can probably just multiply by -1.

Suppose we have D1 = -100 for bus tuple (A, B) and D2 = 0 for bus tuple (B, A), and we have lines defined by (from_A, to_B), does it mean congu = 0 congl = 100 for this time point, i.e., we should interpret this by D1 = 0, D2 = 100 instead of D1 = 100, D2 = 0 (congu = 100, congl = 0)?

danielolsen commented 3 years ago

The sign on the Maximum_DispatchTx duals is consistent with my earlier investigation: #47 (comment). I believe this behavior is an artifact of how the problem was set up and/or interpreted via Pyomo, since a negative dual for an inequality constraint doesn't make sense in classical optimization formulations, so we can probably just multiply by -1.

Suppose we have D1 = -100 for bus tuple (A, B) and D2 = 0 for bus tuple (B, A), and we have lines defined by (from_A, to_B), does it mean congu = 0 congl = 100 for this time point, i.e., we should interpret this by D1 = 0, D2 = 100 instead of D1 = 100, D2 = 0 (congu = 100, congl = 0)?

Let's look at the power flow at this time period. In our terminology, congu is congestion at the upper power flow limit, so for a non-zero dual (ignoring things in the order of 1e-12) the power flow should be at its maximum for the from->to direction. If the power flow is instead at its maximum in the to->from direction (i.e. at the lower limit), then we know that this should be congl instead. I think once we figure out the sign/direction issue for one pf/dual pair, it should be internally consistent for all the others.

BainanXia commented 3 years ago

The sign on the Maximum_DispatchTx duals is consistent with my earlier investigation: #47 (comment). I believe this behavior is an artifact of how the problem was set up and/or interpreted via Pyomo, since a negative dual for an inequality constraint doesn't make sense in classical optimization formulations, so we can probably just multiply by -1.

Suppose we have D1 = -100 for bus tuple (A, B) and D2 = 0 for bus tuple (B, A), and we have lines defined by (from_A, to_B), does it mean congu = 0 congl = 100 for this time point, i.e., we should interpret this by D1 = 0, D2 = 100 instead of D1 = 100, D2 = 0 (congu = 100, congl = 0)?

Let's look at the power flow at this time period. In our terminology, congu is congestion at the upper power flow limit, so for a non-zero dual (ignoring things in the order of 1e-12) the power flow should be at its maximum for the from->to direction. If the power flow is instead at its maximum in the to->from direction (i.e. at the lower limit), then we know that this should be congl instead. I think once we figure out the sign/direction issue for one pf/dual pair, it should be internally consistent for all the others.

I think you are right: I found following instance: branch_id = 98577, rataA = 3096.71, and looking through the power flow, it is congested during all time points with negative flow -3096.71. Whereas, if we look at the md values for this branch, I found md[from_bus, to_bus] are all zeros, i.e. congu are all zeros, and md[to_bus, from_bus] are all negative values, i.e. congl are non-zeros. Hence, the conclusion should be the sign of the md values doesn't define the power flow direction, i.e. we should simply take the absolute values from these dual variables.

YifanLi86 commented 3 years ago

@BainanXia I agree with all that @danielolsen has mentioned above. Just a reminder congu and congl need to be divided by the timepoint weight as well (same with LMP).

Another approach is that since you already extracted LMP data, you could simply subtract the from and to busses' LMPs to get line shadow price if there is no losses, if I remember it correctly. E.g.: For busses A & B and branch AB,

If branch is defined A->B, and LMP(A) - LMP(B) > 0, then congu(AB) = LMP(A) - LMP(B), and congl(AB) = 0. Else congu(AB) = 0, and congl(AB) = LMP(A) - LMP(B).

Might need to verify a couple of cases to see if the dual results of LMP and shadow price actually lining up. This could also help confirm Maximum_Dispatch_Tx is indeed the shadow price.

danielolsen commented 3 years ago

Another approach is that since you already extracted LMP data, you could simply subtract the from and to busses' LMPs to get line shadow price if there is no losses, if I remember it correctly. E.g.: For busses A & B and branch AB,

If branch is defined A->B, and LMP(A) - LMP(B) > 0, then congu(AB) = LMP(A) - LMP(B), and congl(AB) = 0. Else congu(AB) = 0, and congl(AB) = LMP(A) - LMP(B).

Might need to verify a couple of cases to see if the dual results of LMP and shadow price actually lining up. This could also help confirm Maximum_Dispatch_Tx is indeed the shadow price.

I believe this works in a simple transport model, but not in the general case where there are other constraints on line flow (e.g. DC power flow, flowgate constraints, etc.).

BainanXia commented 3 years ago

@BainanXia I agree with all that @danielolsen has mentioned above. Just a reminder congu and congl need to be divided by the timepoint weight as well (same with LMP).

Another approach is that since you already extracted LMP data, you could simply subtract the from and to busses' LMPs to get line shadow price if there is no losses, if I remember it correctly. E.g.: For busses A & B and branch AB,

If branch is defined A->B, and LMP(A) - LMP(B) > 0, then congu(AB) = LMP(A) - LMP(B), and congl(AB) = 0. Else congu(AB) = 0, and congl(AB) = LMP(A) - LMP(B).

Might need to verify a couple of cases to see if the dual results of LMP and shadow price actually lining up. This could also help confirm Maximum_Dispatch_Tx is indeed the shadow price.

Thanks for the reminder. I've done the implementation. Will get a PR up soon. Some immediate validations according to your suggestion:

Looking at the same branch as I posted above, branch_id = 98577, congu are all zeros as expected, congl are all positive values as calculated (dividing by the corresponding weights), however, lmp(from_bus) - lmp(to_bus) > 0 and the values are not aligned with congl. In other words, the observation is the branch is defined A -> B, and LMP(A) - LMP(B) > 0, then congl(AB) != LMP(A) - LMP(B).

In my non-expert point of view, power in general should flow from low lmp bus to high lmp bus (intuitively, from cheap location to expensive location), which is consistent with the observation above, LMP(A) is higher and power flows from B to A. Regarding the values of shadow price and LMP differences, I'm with @danielolsen, the reason is we do have other constraints on the line flow besides capacity limits.

YifanLi86 commented 3 years ago

@BainanXia I agree with all that @danielolsen has mentioned above. Just a reminder congu and congl need to be divided by the timepoint weight as well (same with LMP). Another approach is that since you already extracted LMP data, you could simply subtract the from and to busses' LMPs to get line shadow price if there is no losses, if I remember it correctly. E.g.: For busses A & B and branch AB, If branch is defined A->B, and LMP(A) - LMP(B) > 0, then congu(AB) = LMP(A) - LMP(B), and congl(AB) = 0. Else congu(AB) = 0, and congl(AB) = LMP(A) - LMP(B). Might need to verify a couple of cases to see if the dual results of LMP and shadow price actually lining up. This could also help confirm Maximum_Dispatch_Tx is indeed the shadow price.

Thanks for the reminder. I've done the implementation. Will get a PR up soon. Some immediate validations according to your suggestion:

Looking at the same branch as I posted above, branch_id = 98577, congu are all zeros as expected, congl are all positive values as calculated (dividing by the corresponding weights), however, lmp(from_bus) - lmp(to_bus) > 0 and the values are not aligned with congl. In other words, the observation is the branch is defined A -> B, and LMP(A) - LMP(B) > 0, then congl(AB) != LMP(A) - LMP(B).

In my non-expert point of view, power in general should flow from low lmp bus to high lmp bus (intuitively, from cheap location to expensive location), which is consistent with the observation above, LMP(A) is higher and power flows from B to A. Regarding the values of shadow price and LMP differences, I'm with @danielolsen, the reason is we do have other constraints on the line flow besides capacity limits.

Thanks @BainanXia, that's quick action! Yes if there is no DC flow constraints then powerflow tends to go from cheaper place to expensive place to minimize cost, otherwise it will per the voltage angle.

I feel the reason of the values are not lining up is not because of other constraints since SWITCH actually is a transport model so there are essentially no other constraints. The reason is because of losses which are defined in SWITCH so we do have a loss component there causing the mismatch. Actually, since we are connecting SWITCH (with loss modeling) results to PowerSimData (lossless network). I would lean towards to add the loss components back to maintain consistency FYI.

BainanXia commented 3 years ago

@danielolsen Do we have a place (issue) for adding loss components in PCM? If so we should mention the conversation here.

danielolsen commented 3 years ago

@danielolsen Do we have a place (issue) for adding loss components in PCM? If so we should mention the conversation here.

There's no issue within REISE.jl for adding losses. The closest one, which involves different power flow models in general, is probably https://github.com/Breakthrough-Energy/REISE.jl/issues/60. I'm not aware of DCOPF formulations that incorporate losses, since generally one of the approximations that's made is under the assumption that r << x and losses are small enough to be ignored, but we can implement anything we can define equations for, assuming it's still computationally tractable. I don't think there's anything built into PowerSimData that looks at losses either.

Given that the power flow model within Switch is quite different than the power flow model within REISE.jl, what is the value in trying to post-process the results from one to try to emulate what the results would be in another?

YifanLi86 commented 3 years ago

Perhaps we should just assume lossless network in expansion model as well at this time. If so and just for visualization purpose, @BainanXia I am totally ok to ignore loss components for now and please go ahead with your current approach of implementation.

Breakthrough-Energy / SwitchWrapper

Process output data #47