Fastest way to check for voltage violations

NicolasGensollen commented 6 years ago

Hey @kdheepak,

I'm working on a Hosting Capacity tool in Python for LA100 which of course uses OpenDSSdirect.py. Obviously, checking for voltage violations is a key piece of the code and needs to be optimized as much as possible. I came up with 2 different methods, one that outputs all voltages to a CSV file, reads the file back in, and checks for violations with Numpy (called method2 in the script). The other one loops over the buses, sets the active bus, computes the voltage, and if it is outside of the limits returns (called method1 in the script).

I made the following simple script to test the performances of these 2 methods on the EPRI J1 feeder:

Script:

import numpy as np
import pandas as pd
import time
import opendssdirect as dss

def method1(_min, _max):
    """
    First method to compute if there is a voltage violation.
    """
    #Loop over the buses
    for name in dss.Circuit.AllBusNames():
        #Set the Active bus
        dss.Circuit.SetActiveBus(name)
        #Compute the voltage
        voltages=np.array([abs(complex(i[0], i[1])) for i in zip(*[iter(dss.Bus.PuVoltage())]*2)])
        #If outside the limits, return True
        if np.any(voltages>_max) or np.any(voltages<_min):
            return True
    #Otherwise, return False
    return False

def method2(_min, _max):
    """
    Second method to compute if there is a voltage violation.
    """
    #Export the voltages to a CSV file
    dss.run_command("Export Voltages LN Nodes")

    #Read the CSV file using Pandas
    df = pd.read_csv('ln')

    #Clean the column names and select the per unit columns
    df.columns=[c.strip() for c in df.columns]
    voltages=df[['pu1', 'pu2', 'pu3']].to_dict(orient='list')

    #Build a Numpy array
    voltages_pu1=np.array(voltages['pu1'])
    voltages_pu2=np.array(voltages['pu2'])
    voltages_pu3=np.array(voltages['pu3'])

    #Replace 0 values with 1 to avoid fake violations
    voltages_pu2[voltages_pu2==0]=1
    voltages_pu3[voltages_pu3==0]=1
    current_voltages_pu=np.vstack((voltages_pu1,
                                   voltages_pu2,    
                                   voltages_pu3))
    dV=np.array(current_voltages_pu-1)
    t=_max-1.0
    return np.any(np.abs(dV)>t)

def main():
    """
    Compute the average computation time for both methods and different limits
    on the EPRI J1 feeder.
    """
    #For each limits
    for _min,_max in [(.95,1.05),(.94,1.04),(.93,1.03),(.92,1.02),(.91,1.01)]:

        #Redirect the EPRI J1 feeder
        #Change the path to your own test feeder path
        dss.run_command("redirect ../tests/data/epri_j1/master.dss")

        #Evaluate Method 1
        #
        L = []
        for x in range(100):
            start = time.time()
            _ = method1(_min,_max)
            end = time.time()
            L.append(end-start)

        print("Average computation time for method 1 and ({mi},{ma}) = {r}".format(
                                                   mi=_min, ma=_max, r=np.mean(L)))

        #Evaluate Method 2
        #
        L = []
        for x in range(100):
            start = time.time()
            _ = method2(_min,_max)
            end = time.time()
            L.append(end-start)

        print("Average computation time for method 2 and ({mi},{ma}) = {r}".format(
                                                   mi=_min, ma=_max, r=np.mean(L)))

if __name__ == '__main__':
    main()

Results:

$ python voltage_violation_performance.py

Average computation time for method 1 and (0.95,1.05) = 0.15855183124542235
Average computation time for method 2 and (0.95,1.05) = 0.038105881214141844
Average computation time for method 1 and (0.94,1.04) = 0.1694297194480896
Average computation time for method 2 and (0.94,1.04) = 0.03945516586303711
Average computation time for method 1 and (0.93,1.03) = 0.020558161735534666
Average computation time for method 2 and (0.93,1.03) = 0.039055778980255126
Average computation time for method 1 and (0.92,1.02) = 0.02062744140625
Average computation time for method 2 and (0.92,1.02) = 0.038210883140563964
Average computation time for method 1 and (0.91,1.01) = 0.01953822612762451
Average computation time for method 2 and (0.91,1.01) = 0.038526935577392576

Method 2 has a constant average execution time while method 1 depends on the presence of violations early in the for loop (which explains that it is able to beat method 2 with very aggressive limits). Based on these results, I'd say that the second method looks better even if I prefer what method 1 is doing (no intermediate file and no weird substitutions).

I was wondering if you had other ideas to do this or suggestions to increase the speed.

Thanks!

kdheepak commented 6 years ago

Yes, there are many ways to improve performance in OpenDSSDirect.py.

Without benchmarking, I was going to guess that method1 will always be the slowest way to do it (for loops, calling Python functions to create a Python float that is then converted to a NumPy array). method2 is using fast probably optimized code to write the data to a CSV (using the export) and then reading it using fast optimized pandas's read_csv function. I honestly thought this would be slower than your benchmarks show (because of file I/O, converting from pandas to dict to numpy instead of pandas to numpy). You are also using a call out to the numpy comparisons (np.any(voltages>_max) or np.any(voltages<_min) in method1 one time for each bus versus once (np.any(np.abs(dV)>t)) in method2. For a more fair comparison, I'd start first by benchmarking the reading of the voltages into the same format using the two different methods. That'd give a comparison of essentially file I/O vs Python functions.

In terms of improving performance, we can read the voltages into a numpy array directly from OpenDSS. That will be the fastest way to do it. I'd have to think about it to come up with the way to do it, and then think about how to make it a general function that I can include along with this package. That might require a well overdue rewrite of a few functions in the core.

So yes, there's ways to improve performance once we figure out that reading from OpenDSS is becoming a bottleneck.

tshort commented 6 years ago

OpenDSSDirect exposes a low-level function getVpointer. In OpenDSSDirect.jl, that's used to implement a getV function to put these in a Julia array:

https://github.com/tshort/OpenDSSDirect.jl/blob/734bba53d9e0fd2c34daed2b7743a278b992f428/src/core.jl#L316-L320

With something similar in Python, that'd be a super fast way of doing voltage checks. That's the raw voltage array in OpenDSS, so if you can have a numpy array point to that, comparisons would be quite fast.

kdheepak commented 6 years ago

Thanks @tshort for expanding on that. @NicolasGensollen that function is exactly what I meant we'd use we can do to "read the voltages into a numpy array directly from OpenDSS". @tshort has kindly listed the exact function we need as well.

NicolasGensollen commented 6 years ago

@kdheepak, @tshort thanks for your super fast and useful replies! I'd be really interested to see the performance gain we'd get with this approach.

tshort commented 6 years ago

FYI, voltage checking using getV on that circuit takes about 0.2 msecs on my ten-year-old Linux workstation (with no violations):

julia> using OpenDSSDirect

DSS> redirect Master_noPV.dss

julia> const Vbase = abs.(DSS.Circuit.AllBusVolts()) ./ DSS.Circuit.AllBusMagPu()

julia> function isViolation()
           V = OpenDSSDirect.DSSCore.getV()
           for i in 1:length(Vbase)
               v = abs(V[i+1]) / Vbase[i]
               if v < 0.94 || v > 1.05
                   return true
               end
           end
           return false
       end
isViolation (generic function with 1 method)

julia> @time isViolation()
  0.000209 seconds (6 allocations: 256 bytes)
false

NicolasGensollen commented 6 years ago

Indeed, much faster than what I'm currently doing!

kdheepak commented 6 years ago

That getV seems blazingly fast. It looks like it would be really nice to have OpenDSS pointer to voltage array to numpy implementation similar to getV.

Just for comparison, here is a DSS.Circuit.AllBusMagPu -> Python List -> NumPy -> isViolation versus DSS.Circuit.AllBusMagPu -> Python List -> isViolation.

Here's the same in Julia

Julia is going to be significantly faster even when you don't use the direct pointer. It'll be interesting to see the direct to numpy comparison.

kdheepak commented 6 years ago

I'm not really sure what is going on here. I'm getting faster performance with DSS.Circuit.AllBusMagPu compared to getV.

@tshort any insights?

kdheepak commented 6 years ago

OpenDSS is calculating the per unit magnitude and sending back the array in the first case, whereas in the second there's some additional calculations. I guess I was expecting them to be almost identical.

tshort commented 6 years ago

@kdheepak, the first function run is often slower. Julia compiles the function the first time. In the output I provided earlier, I clipped out the timing for the first run. For comparisons, you'll want to run both a couple of times.

kdheepak commented 6 years ago

Ah shoot. My bad, I forgot I defined it in the same cell. Let me try again.

tshort commented 6 years ago

Also, I forgot about AllBusMagPu (despite using it above). That's probably sufficiently fast for many uses.

kdheepak commented 6 years ago

Okay, wow. That is significantly faster!

Again, I was expecting them to be similar in performance. The julia implementation is performing better than the OpenDSS loop (OpenDSS AllBusMagPu branch below)

  9: begin                                             // Circuit.AllBusMagPu
     IF ActiveCircuit <> Nil THEN
      WITH ActiveCircuit DO
      Begin
        arg := VarArrayCreate([0, NumNodes-1], varDouble);
        k:=0;
        FOR i := 1 to NumBuses DO
        Begin
           If Buses^[i].kVBase >0.0 then BaseFactor :=  1000.0* Buses^[i].kVBase  Else BaseFactor := 1.0;
            For j := 1 to Buses^[i].NumNodesThisBus  DO
             Begin
              VoltsD := Cabs(ActiveCircuit.Solution.NodeV^[Buses^[i].GetRef(j)]);
              arg[k] := VoltsD/BaseFactor;
              Inc(k);
            End;
        End;
      End
     ELSE arg := VarArrayCreate([0, 0], varDouble);
  end;

getV is pretty neat.

tshort commented 6 years ago

The difference might be that OpenDSS is allocating an array in AllBusMagPu, and allocation is expensive. The actual loop part should be comparable. getV doesn't have to allocate.

kdheepak commented 6 years ago

That would explain it!

NicolasGensollen commented 6 years ago

@kdheepak, @tshort thanks a lot for this cool discussion! The AllBusMagPu to Numpy method proposed by @kdheepak is already much faster than my initial implementations and probably fast enough for what I want to do. Most likely, we won't be able to beat the Julia implementation but if you implement a similar getV function in Python @kdheepak, let me know, I'll definitely be interested! :)

PMeira commented 6 years ago

Hi, all, I've been working with @kdheepak on bringing dss_python and OpenDSSDirect.py together, using a shared custom API to OpenDSS in place of the official Direct DLL, which has limitations for many use-cases. We should reach a version good enough for a new release soon (maybe next week?), as soon as we tidy up some remaining issues. The equivalent to getV, for example, is already exposed there. I wrote some comments about performance below.

I will try to provide more details and sample code in the future: in dss_python we have some utility functions that helped a lot in reducing the run-time. These are not nicely exposed yet but we can explore that in a future release. Ideally, we could expose faster, alternative implementations that cater to common use-cases, to more languages -- there's already a simple C# module and we'll work on supporting more languages in the near future.

About the AllBus... methods: they're are slow for a few reasons: use of variant arrays, multiple memory indirections, multiple loops, and maybe bad optimization by the Pascal compilers (both Delphi and Free Pascal). Still, unless they're really the bottleneck of the implementation, I wouldn't worry too much. I recommend profiling the whole code. Usually caching some of the structures is enough to make the actual solution (Solution.Solve()) dominate the time profile.

Another example of things that are slow are parts of the Monitors API. In the Pascal code for the Channel() method, here's how it's done:

             Result := VarArrayCreate([0, pMon.SampleCount-1], varDouble);
             ReadMonitorHeader(Header, FALSE);   // FALSE = leave at beginning of data
             AuxParser.CmdString := string(Header.StrBuffer);
             AuxParser.AutoIncrement := TRUE;
             FirstCol := AuxParser.StrValue;  // Get rid of first two columns
             AuxParser.AutoIncrement := FALSE;

              AllocSize :=  Sizeof(SngBuffer^[1]) * Header.RecordSize;
              SngBuffer := Allocmem(AllocSize);
              k := 0;
              for i := 1 to pMon.SampleCount  do Begin
                   With pMon.MonitorStream Do
                    Begin
                        Read( hr, SizeOf(hr) );
                        Read( s,  SizeOf(s) );
                        Read( sngBuffer^[1], AllocSize);  // read rest of record
                    End;
                    Result[k] := sngBuffer^[index];
                    inc(k);
              End;

That's reasonable, but for repeated/naive operations this can quickly become a bottleneck. Turns out this Python implementation is quite fast in comparison (that is, read the whole stream and process it in Python):

    def Channel(self, index):
        bs = self.ByteStream
        _, _, record_size, mode = np.frombuffer(bytearray(bs[:16]), dtype=np.int32)
        data = np.frombuffer(bytearray(bs[272:]), dtype=np.float32)
        data = data.reshape((len(data) // (record_size + 2), record_size + 2))
        return data[:, index - 1 + 2].astype(np.float64)

Funnily enough, I found some of these things trying workarounds to bugs in the official Direct DLL in 2016. This Python sample above was never published (instead dss_capi was born) but it does illustrate the opportunities we still have. Stay tuned!

kdheepak commented 6 years ago

Thanks for posting here @PMeira, these changes will definitely be much appreciated by the Python power systems community!

PMeira commented 6 years ago

Since I posted that channel example before, I'll post a link to the updated version that will be used in the next version of DSS Python: dss_capi.py#L2408.

Version 0.10 can use a global result buffer, and that alone can be 10-50% faster depending on the property being read since reallocations can be avoided. Using the result pointer directly and avoiding extra array copies give some extra performance in this case. For Channel(), it uses 50% of runtime of the current version (0.9.8), or 16% when compared to COM.

dss-extensions / OpenDSSDirect.py

Fastest way to check for voltage violations #15