lthurner commented 5 years ago

I am opening a new issue for the discussions in #127. Things that are still to do for the create_nxgraph function:

tests for create_nxgraph
possible performance improvement
look into using pd2ppc functions to calculate the branch values (especially impedances)

WinfriedL commented 5 years ago

The "problem" (might be a feature as well) with using build_branch is, that it needs the ppc. If the ppc is already created, then all build_branches are already finíshed and one would call them again.

So in that cases it would probably be better to use the ppc in the first place to create the graph. But the ppc has already merged buses/nodes connected by switches, so the nodes must be mapped again... Might still be a sensible approach, I will do some testing... If I have the time.

lthurner commented 5 years ago

The "problem" (might be a feature as well) with using build_branch is, that it needs the ppc. If the ppc is already created, then all build_branches are already finíshed and one would call them again.

I wasn't thinking about calling _build_branch_ppc as an entire function, but rather the subfunctions that calculate the impedances. For example if If I didn't miss anything, the only reason the ppc is needed inside of _calc_branch_values_from_trafo_df is to look up the base voltages of the transformer nodes. So if we either created a dummy ppc that has the bus table with the base voltages or just moved the lookup of the base voltages out of these functions, the ppc would not be required.

WinfriedL commented 5 years ago

I wasn't thinking about calling _build_branch_ppc as an entire function, but rather the subfunctions that calculate the impedances.

Me too...

So if we either created a dummy ppc that has the bus table with the base voltages or just moved the lookup of the base voltages out of these functions, the ppc would not be required.

I thought about that as well, but I hesitated to create dummy ppc. Moving the voltage lookup would be a solution. Of course if there is a performant way to get the voltages from the bus dataframe, it would not be needed in the first place...?

lthurner commented 5 years ago

I now implemented a version of the create_nxgraph function, see PR #317 .

The new implementation relies a lot more on numpy instead of pandas functions. That makes the code somewhat less elegant and easy to read, but it gives a huge speedup to the graph creation in smaller networks. For example, for the mv_oberrhein network from the pandapower network package:

edit: updated timing values after 6b6ca91

Parameters	Before	After	Speedup
Without impedances	19.1ms	1.2 ms	94%
With impedances	30.4ms	1.9 ms	94%

For larger networks, the speedup is however much smaller. For example, for the case6470rte network from the pandapower networks packgage:

edit: updated timing values after 6b6ca91

Parameters	Before	After	Speedup
Without impedances	61.0	35.2 ms	42%
With impedances	75.1	44.4 ms	41%

In this example, over 80% of the time in create_nxgraph is spent on calling the add_edge function from networkx. Since networkx is a pure python implementation, the edges are always added in a python loop. So I don't think there is a way to speed this up further, except switching to a library that is implemented in C++, such as graph-tool (which is however very difficult to install on windows, so probably not practical).

Besides performance, the new implementation also takes into account switches at three-winding transformers which wasn't considered in the old version. It also uses the functions from build_branch to calculate the impedances. I added tests for the graph creation and used the code to calculate the impedances to validate the new function. It matches pretty well for simple transformers, but the values start to differ for more complex transformers with open loop losses, non-nominal ratio etc. So the new function should calculate the impedances more correctly for these types of transformers and also avoid duplicate code.

@WinfriedL can you please check out the new implementation for your use case? I would be interested in timings also, but I suspect you won't get a speedup for your large grid...

WinfriedL commented 5 years ago

First of all, thanks for the work and sorry that I did not have the time to work on that. In general I expect to have little time for pandapower in the next months... But I will definitely try this out and get back to you, since I am using the graphs a lot in my current code.

Just one thought at the moment: You are using add_edges() from networkx which, as you stated, does a python loop over the elements and calls add_edge(). But if I recall correctly, it does several checks to pass the right parameters to add_edge() in each loop. It might still be faster doing the loop over the numpy arrays yourself and then call add_edge() directly with the right parameters?

lthurner commented 5 years ago

You are using add_edges() from networkx which, as you stated, does a python loop over the elements and calls add_edge(). But if I recall correctly, it does several checks to pass the right parameters to add_edge() in each loop. It might still be faster doing the loop over the numpy arrays yourself and then call add_edge() directly with the right parameters?

Are you talking about the function add_edges_from() that allows to add multiple edges at once from a list/iterable? Because if so, I am not using that function, I am already iterating with numpy and then calling add_edge() directly:

https://github.com/e2nIEE/pandapower/blob/5a8ba3d338a54e138d2f2e22ae96302f0267d0cc/pandapower/topology/create_graph.py#L217-L226

One other question: the impedances are added in ohms. Wouldn't the pu units be more helpful? If you are measuring electric distance over multiple voltage levels, I would think that per unit values are more signifcant?

edit: actually I can imagine use cases for both. If you are using the electric distance to initialize the power flow, per unit might be better suited because the power flow is also calculated in per unit. If you want to calculate the internal impedance of a network for a xward representation, the ohmic values might be more relevant. So maybe just add both? I don't expect much performance drawback from this with the new numpy based implementation.

WinfriedL commented 5 years ago

I am already iterating with numpy and then calling add_edge() directly:

Ok, sorry I just assumed add_egdes() was the networkx function, I did not look close enough...

Wouldn't the pu units be more helpful?

To be honest: I don't know. So far it works with ohm. We discussed it a few times, but without a clear result.

lthurner commented 5 years ago

Ok, sorry I just assumed add_egdes() was the networkx function, I did not look close enough...

I did however use the add_nodes_from function to add the nodes, changing this to add_node in a for loop did speed up the code a little bit.

To be honest: I don't know. So far it works with ohm. We discussed it a few times, but without a clear result.

So then its probably best to just add both, right?

edit: or even better provide a parameter that let's you chose the unit

lthurner commented 5 years ago

I now added a parameter to chose between "ohm" and "pu" as the branch impedance unit. The parameters of create_nxgraph are now:

calc_branch_impedances (boolean, False) - detrmines wether impedances are calculated and added as a weight to all branches or not. Impedances can be added in ohm or per unit (see branch_impedance unit parameter)
- branch_impedance_unit (str, "ohm") - defines the unit of the branch impedance for calc_branch_impedances=True. If it is set to "ohm", the parameters 'r_ohm', 'x_ohm' and 'z_ohm' are added to each branch. If it is set to "pu", the parameters are 'r_pu', 'x_pu' and 'z_pu'.

I also optimized the code in add_edges some more, the timings in the tables above are updated. I now get a nice speedup of over 30% even in the 6000 bus network, and over 90% speedup in the 160 bus network.

edit: I think this adresses all the points that have been raised and discusse, so I am merging the changes into develop and closing this issue. If someone has timings of the speedup, I would be interested to see how the performance changes in real use cases.

wangzhenassd commented 5 years ago

I have also tested the performance of the update: Test environment: i7-4712 MQ 2.3GHz, 16GB RAM, Anaconda Python3.7

Item	Network 1	Network 2	Network 3	Network 4
Bus	3700	2500	3000	3800
Trafo3w	60	30	20	20
Trafo	80	80	120	130
Line	210	180	270	320

Testing result in ms:

Commit	Description	Network 1	Network 2	Network 3	Network 4
bedcd5d2	Before without impedance	111	83.9	95.3	106
99ae5d3	After without impedance	67.7	46.3	54.5	66

Commit	Description	Network 1	Network 2	Network 3	Network 4
bedcd5d2	Before with impedance	142	121	129	142
99ae5d3	After with impedance	87	59.2	67	80

WinfriedL commented 5 years ago

My timings are reversed, the new implementation is slower in my case. Could there be a problem with the huge number of buses and switches? Network:

Item	Elements
bus	200000
switches	235000
line	11000
trafo	4400
trafo3w	1700
impedance	2000

Commit	Description	Time [ms]
bedcd5d	Before without impedance	1020
99ae5d3	After without impedance	1263

Commit	Description	Time [ms]
bedcd5d	Before with impedance	1278
99ae5d3	After with impedance	1517

lthurner commented 5 years ago

Oh ok, that is unexpected.

Could there be a problem with the huge number of buses and switches?

I don't see how. The overhead to select the buses and switches should definitely be smaller now, and in my tests the overwhelming amount of time goes into the mg.add_edge() function, which als had to be called in the old version.

The first thing I can think of that can be more time consuming in the new implementation is the calculation of transformer and trafo3w impedances, because they are calculated in more detail taking into account off-nominal tap ratio and tap changer etc. But you are also getting a slowdown without calculating the branch impedances, so that can't be it.

The other thing would be switches at trafo3ws. They were not considered in the old implementation, but are considered in the new implementation. I don't see how they would cause such a large slow down, assuming you even have t3 switches in the grid?

Can you time with include_lines/include_impedance/include_trafo=False to narrow down which element causes the problem? Also: can you check if the number of nodes and edges is the same in both cases? Maybe something changed undeliberately?

WinfriedL commented 5 years ago

Could there be a problem with the huge number of buses and switches?

To answer my own question: No. If I drop the switch dataframe completly, the new implentation is still slower.

I don't see how they would cause such a large slow down, assuming you even have t3 switches in the grid?

I only have bus-bus switches

Can you time with include_lines/include_impedance/include_trafo=False to narrow down which element causes the problem? Also: can you check if the number of nodes and edges is the same in both cases? Maybe something changed undeliberately?

Yes, I am on it.

ascheidl commented 5 years ago

a little offtopic: what is the purpose of adding all buses first (mg.add_nodes_from(net.bus.index)) in graph creation ?

Leaving this out results in faster graph creation. Disadvantage is that buses that have no connected line/trafo/... will not be in the graph.

lthurner commented 5 years ago

It could really be about adding the buses, because I changed the code for adding the nodes from

    mg.add_nodes_from(net.bus.index)

to

    for b in net.bus.index:
        mg.add_node(b)

in my timings that was faster, but it might be slower for some reason with a huge number of buses?

Leaving this out results in faster graph creation. Disadvantage is that buses that have no connected line/trafo/... will not be in the graph.

That is true, I didn't realize the nodes are added in add_edges if they don't exist. It might still be faster to do it this way and then just add the standalone buses later, such as

for b in set(net.bus.index) - set(mg.nodes()):
   mg.add_node(b)

?

WinfriedL commented 5 years ago

in my timings that was faster, but it might be slower for some reason with a huge number of buses?

While I don't understand why, mg.add_nodes_from(net.bus.index) is indeed faster in my case.

lthurner commented 5 years ago

While I don't understand why, mg.add_nodes_from(net.bus.index) is indeed faster in my case.

I just added a new version that adds the buses during edge creation as proposed by @ascheidl and checks for standalone buses afterwards. I also realized that the ppc was calculated even for calc_branch_impedances=False, even though it isn't needed. Can you check the timings again for 86cdcf3 ? This change gave another good speedup in my test cases (see edited values above).

WinfriedL commented 5 years ago

Current observation: the new implementation calls mg.add_egde() the came number of times as the old implementation, but takes significantly longer for the calls:

Old:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   202789    0.903    0.000    1.018    0.000 multigraph.py:361(add_edge)
        1    0.571    0.571    3.025    3.025 create_graph.py:17(create_nxgraph)

New:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   202789    1.242    0.000    1.369    0.000 multigraph.py:361(add_edge)
        7    0.568    0.081    1.937    0.277 create_graph.py:238(add_edges)

The only difference I can see between the calls:

The new Implementation has one additional weight (X), but that does not seem to be the only problem. Timing without X:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
202789    1.165    0.000    1.274    0.000 multigraph.py:361(add_edge)
    7    0.488    0.070    1.762    0.252 create_graph.py:238(add_edges)

The old implementation passed a dictionary as weights, the new passes all weights individually.

lthurner commented 5 years ago

Is the creation of the dictionary included in this timing though? Because when comparing something like:

weights = {"a: 1, "b": 2}
mg.add_edge(0, 1, **weights)

and:

mg.add_edge(0, 1, a=1, b=2)

Calling the add_edge function might take longer in the second case, but of course in the first case there needs to be a dictionary created beforehand which has to be counted as well.

I compared using a dictionary and passing the parameters directly, and passing the weights directly was faster for me in all cases, that is why I switched it.

WinfriedL commented 5 years ago

There does not seem to be a significant difference in the profiling result using a dictionary or not.

Could it be that the dtypes are different and that makes a difference for networkx? Quick tests were not conclusive though. At the moment I am out of ideas and out of time ;-)

BTW: Which networkx version do you use, could there be a difference? My tests are with networkx 2.2

lthurner commented 5 years ago

At the moment I am out of ideas

Me too. I created a test network with a lot of buses like this:

import pandapower as pp
import pandapower.networks as nw
net = nw.case9241pegase()
for i in range(4):
    net = pp.merge_nets(net, net)

which gives you a grid with ~150k buses. create_nxgraph(net) times at 1.5s @ bedcd5d and 1.2s @ 68d2e7b. I have no idea what happens in your grid that it is slower.

My tests are with networkx 2.2

I also use networkx 2.2

lthurner commented 5 years ago

I am going to close this as I am not able to reproduce the issue.

@WinfriedL If you look further into it and identify the problem, feel free to reopen the issue. Or if you can provide an anonymized version of your network (e.g. removing all names and setting all line length to 1 or something similar) I can look into it and try to identify the problem.

WinfriedL commented 5 years ago

If you look further into it and identify the problem, feel free to reopen the issue.

Yes, thanks. It does not make sense to keep this open while it seems to be a very specific and not reproducible problem. Especially since I have very little time at the moment to look into it.

e2nIEE / pandapower

create_nxgraph: performance and robustness #271

edit: updated timing values after 6b6ca91

edit: updated timing values after 6b6ca91

edit: or even better provide a parameter that let's you chose the unit

edit: I think this adresses all the points that have been raised and discusse, so I am merging the changes into develop and closing this issue. If someone has timings of the speedup, I would be interested to see how the performance changes in real use cases.