Closed NicolasGensollen closed 6 years ago
Yes, there are many ways to improve performance in OpenDSSDirect.py.
Without benchmarking, I was going to guess that method1
will always be the slowest way to do it (for loops, calling Python functions to create a Python float that is then converted to a NumPy array). method2
is using fast probably optimized code to write the data to a CSV (using the export
) and then reading it using fast optimized pandas's read_csv
function. I honestly thought this would be slower than your benchmarks show (because of file I/O, converting from pandas to dict to numpy instead of pandas to numpy). You are also using a call out to the numpy comparisons (np.any(voltages>_max) or np.any(voltages<_min)
in method1
one time for each bus versus once (np.any(np.abs(dV)>t)
) in method2
. For a more fair comparison, I'd start first by benchmarking the reading of the voltages into the same format using the two different methods. That'd give a comparison of essentially file I/O vs Python functions.
In terms of improving performance, we can read the voltages into a numpy array directly from OpenDSS. That will be the fastest way to do it. I'd have to think about it to come up with the way to do it, and then think about how to make it a general function that I can include along with this package. That might require a well overdue rewrite of a few functions in the core.
So yes, there's ways to improve performance once we figure out that reading from OpenDSS is becoming a bottleneck.
OpenDSSDirect exposes a low-level function getVpointer
. In OpenDSSDirect.jl, that's used to implement a getV
function to put these in a Julia array:
With something similar in Python, that'd be a super fast way of doing voltage checks. That's the raw voltage array in OpenDSS, so if you can have a numpy array point to that, comparisons would be quite fast.
Thanks @tshort for expanding on that. @NicolasGensollen that function is exactly what I meant we'd use we can do to "read the voltages into a numpy array directly from OpenDSS". @tshort has kindly listed the exact function we need as well.
@kdheepak, @tshort thanks for your super fast and useful replies! I'd be really interested to see the performance gain we'd get with this approach.
FYI, voltage checking using getV
on that circuit takes about 0.2 msecs on my ten-year-old Linux workstation (with no violations):
julia> using OpenDSSDirect
DSS> redirect Master_noPV.dss
julia> const Vbase = abs.(DSS.Circuit.AllBusVolts()) ./ DSS.Circuit.AllBusMagPu()
julia> function isViolation()
V = OpenDSSDirect.DSSCore.getV()
for i in 1:length(Vbase)
v = abs(V[i+1]) / Vbase[i]
if v < 0.94 || v > 1.05
return true
end
end
return false
end
isViolation (generic function with 1 method)
julia> @time isViolation()
0.000209 seconds (6 allocations: 256 bytes)
false
Indeed, much faster than what I'm currently doing!
That getV
seems blazingly fast. It looks like it would be really nice to have OpenDSS pointer to voltage array to numpy implementation similar to getV
.
Just for comparison, here is a DSS.Circuit.AllBusMagPu -> Python List -> NumPy -> isViolation
versus DSS.Circuit.AllBusMagPu -> Python List -> isViolation
.
Here's the same in Julia
Julia is going to be significantly faster even when you don't use the direct pointer. It'll be interesting to see the direct to numpy comparison.
I'm not really sure what is going on here. I'm getting faster performance with DSS.Circuit.AllBusMagPu
compared to getV
.
@tshort any insights?
OpenDSS is calculating the per unit magnitude and sending back the array in the first case, whereas in the second there's some additional calculations. I guess I was expecting them to be almost identical.
@kdheepak, the first function run is often slower. Julia compiles the function the first time. In the output I provided earlier, I clipped out the timing for the first run. For comparisons, you'll want to run both a couple of times.
Ah shoot. My bad, I forgot I defined it in the same cell. Let me try again.
Also, I forgot about AllBusMagPu
(despite using it above). That's probably sufficiently fast for many uses.
Okay, wow. That is significantly faster!
Again, I was expecting them to be similar in performance. The julia implementation is performing better than the OpenDSS loop (OpenDSS AllBusMagPu branch below)
9: begin // Circuit.AllBusMagPu
IF ActiveCircuit <> Nil THEN
WITH ActiveCircuit DO
Begin
arg := VarArrayCreate([0, NumNodes-1], varDouble);
k:=0;
FOR i := 1 to NumBuses DO
Begin
If Buses^[i].kVBase >0.0 then BaseFactor := 1000.0* Buses^[i].kVBase Else BaseFactor := 1.0;
For j := 1 to Buses^[i].NumNodesThisBus DO
Begin
VoltsD := Cabs(ActiveCircuit.Solution.NodeV^[Buses^[i].GetRef(j)]);
arg[k] := VoltsD/BaseFactor;
Inc(k);
End;
End;
End
ELSE arg := VarArrayCreate([0, 0], varDouble);
end;
getV
is pretty neat.
The difference might be that OpenDSS is allocating an array in AllBusMagPu, and allocation is expensive. The actual loop part should be comparable. getV
doesn't have to allocate.
That would explain it!
@kdheepak, @tshort thanks a lot for this cool discussion!
The AllBusMagPu
to Numpy
method proposed by @kdheepak is already much faster than my initial implementations and probably fast enough for what I want to do.
Most likely, we won't be able to beat the Julia implementation but if you implement a similar getV
function in Python @kdheepak, let me know, I'll definitely be interested! :)
Hi, all,
I've been working with @kdheepak on bringing dss_python
and OpenDSSDirect.py
together, using a shared custom API to OpenDSS in place of the official Direct DLL, which has limitations for many use-cases. We should reach a version good enough for a new release soon (maybe next week?), as soon as we tidy up some remaining issues. The equivalent to getV
, for example, is already exposed there. I wrote some comments about performance below.
I will try to provide more details and sample code in the future: in dss_python
we have some utility functions that helped a lot in reducing the run-time. These are not nicely exposed yet but we can explore that in a future release. Ideally, we could expose faster, alternative implementations that cater to common use-cases, to more languages -- there's already a simple C# module and we'll work on supporting more languages in the near future.
About the AllBus...
methods: they're are slow for a few reasons: use of variant arrays, multiple memory indirections, multiple loops, and maybe bad optimization by the Pascal compilers (both Delphi and Free Pascal). Still, unless they're really the bottleneck of the implementation, I wouldn't worry too much. I recommend profiling the whole code. Usually caching some of the structures is enough to make the actual solution (Solution.Solve()
) dominate the time profile.
Another example of things that are slow are parts of the Monitors API. In the Pascal code for the Channel()
method, here's how it's done:
Result := VarArrayCreate([0, pMon.SampleCount-1], varDouble);
ReadMonitorHeader(Header, FALSE); // FALSE = leave at beginning of data
AuxParser.CmdString := string(Header.StrBuffer);
AuxParser.AutoIncrement := TRUE;
FirstCol := AuxParser.StrValue; // Get rid of first two columns
AuxParser.AutoIncrement := FALSE;
AllocSize := Sizeof(SngBuffer^[1]) * Header.RecordSize;
SngBuffer := Allocmem(AllocSize);
k := 0;
for i := 1 to pMon.SampleCount do Begin
With pMon.MonitorStream Do
Begin
Read( hr, SizeOf(hr) );
Read( s, SizeOf(s) );
Read( sngBuffer^[1], AllocSize); // read rest of record
End;
Result[k] := sngBuffer^[index];
inc(k);
End;
That's reasonable, but for repeated/naive operations this can quickly become a bottleneck. Turns out this Python implementation is quite fast in comparison (that is, read the whole stream and process it in Python):
def Channel(self, index):
bs = self.ByteStream
_, _, record_size, mode = np.frombuffer(bytearray(bs[:16]), dtype=np.int32)
data = np.frombuffer(bytearray(bs[272:]), dtype=np.float32)
data = data.reshape((len(data) // (record_size + 2), record_size + 2))
return data[:, index - 1 + 2].astype(np.float64)
Funnily enough, I found some of these things trying workarounds to bugs in the official Direct DLL in 2016. This Python sample above was never published (instead dss_capi
was born) but it does illustrate the opportunities we still have. Stay tuned!
Thanks for posting here @PMeira, these changes will definitely be much appreciated by the Python power systems community!
Since I posted that channel example before, I'll post a link to the updated version that will be used in the next version of DSS Python: dss_capi.py#L2408.
Version 0.10 can use a global result buffer, and that alone can be 10-50% faster depending on the property being read since reallocations can be avoided. Using the result pointer directly and avoiding extra array copies give some extra performance in this case. For Channel()
, it uses 50% of runtime of the current version (0.9.8), or 16% when compared to COM.
Hey @kdheepak,
I'm working on a Hosting Capacity tool in Python for LA100 which of course uses
OpenDSSdirect.py
. Obviously, checking for voltage violations is a key piece of the code and needs to be optimized as much as possible. I came up with 2 different methods, one that outputs all voltages to a CSV file, reads the file back in, and checks for violations withNumpy
(calledmethod2
in the script). The other one loops over the buses, sets the active bus, computes the voltage, and if it is outside of the limits returns (calledmethod1
in the script).I made the following simple script to test the performances of these 2 methods on the EPRI J1 feeder:
Script:
Results:
Method 2 has a constant average execution time while method 1 depends on the presence of violations early in the for loop (which explains that it is able to beat method 2 with very aggressive limits). Based on these results, I'd say that the second method looks better even if I prefer what method 1 is doing (no intermediate file and no weird substitutions).
I was wondering if you had other ideas to do this or suggestions to increase the speed.
Thanks!