dss-extensions / dss_matlab

MATLAB interface to our alternative implementation of OpenDSS, based on the DSS C-API library aiming for full COM compatibility on Windows, Linux and MacOS.
BSD 3-Clause "New" or "Revised" License
10 stars 2 forks source link

Runs slower than actxserver? What am I doing wrong? A: profiler overhead #11

Closed Sigrsteinn closed 1 year ago

Sigrsteinn commented 4 years ago

Hello, I am sorry if this is a stupid question, but: I need to run OpenDSS with Matlab. I profiled my code with Matlab, most of the time is spent on sending the commands through OpenDSS COM interface. I read up about early binding and late binding in OpenDSS Documentation. I tried the early binding method with DCSL as seen in the documentation to speed up my code. It didn't work because I don't know how to handle the "variant" data type in Matlab.

That's when I found this Matlab package on GitHub. I see that the code in this package is somewhat similar to the DCSL method, particularly the "calllib(libname,command,arg)". I assumed that this package is similar in concept, and therefore faster than the ActiveX COM interface method. I read the usage guide. Since my code uses "DSSStartup.m", I replaced the object instantiation as suggested. I didn't change anything else in my code.

When I run my code, it is considerably slower than when I used the original actxserver method. What am I doing wrong here?

PMeira commented 4 years ago

Hi, @Sigrsteinn, Usually it's on par for function calls and the actual solution is a bit faster. But I guess it depends a lot on your code, since some users reported it to be faster. Someday we should have a benchmark suite here. For example, DSS_Python is typically 5x, up to 100x, faster than win32com. In .NET, DSS_Sharp is also faster since we don't need dynamic variables (which variant usually does).

For DSS_MATLAB, we do have a lot more error checking. It you're using too many API calls and they're all checked on the MATLAB side, I imagine it might have a negative impact, especially since MATLAB is not exactly fast for non-numeric code. If that is indeed the case, I could add an alternative that does no checks. (Since MATLAB users include lots of OpenDSS beginners, it's an important feature.)

More recent versions of MATLAB have faster COM implementations too. Years ago, CallLib calls were 10x faster, but nowadays it seems MATLAB has more optimized COM calls, it's probably already using early-bindings behind the scenes. Most information regarding this on the forums and docs is probably out-of-date -- for example, the early-binding doc is from 2015, while some of the MATLAB changes landed maybe on 2018 (not sure myself).

It didn't work because I don't know how to handle the "variant" data type in Matlab.

Yep, using variant structures in a non-COM DLL was a bad idea IMHO. That's one of the reasons I started DSS C-API in the first place (more on https://sourceforge.net/p/electricdss/discussion/861976/thread/525c13df/).

I assumed that this package is similar in concept, and therefore faster than the ActiveX COM interface method.

At first glance, yes, it would be similar in concept to the official OpenDSSDirect library. Yet, OpenDSSDirect.py and OpenDSSDirect.jl were both migrated from the official library to DSS C-API due to several issues (many related to variant issues, general bugs, and platform-specific bugs), some of which would still be present today. DSS C-API provides a full header with plain C interface, no variants: https://github.com/dss-extensions/dss_capi/blob/master/include/dss_capi.h A lot of the functions my team uses (both API and internal OpenDSS code) have been optimized and we'll keep doing that for a couple more years, at least.

The faster alternative to CallLib is using MEX code, which is cumbersome, especially since this a tiny open-source project. Besides writing a lot of code, we'd need to build the binaries for all platforms (Windows, Linux, macOS) and multiple MATLAB versions. Maybe it's a good idea to test it for benchmarking, to get an idea of how faster it could get.

Since my code uses "DSSStartup.m", I replaced the object instantiation as suggested. I didn't change anything else in my code.

I'm pretty happy at least that worked without issues 😃

When I run my code, it is considerably slower than when I used the original actxserver method. What am I doing wrong here?

In the end, it's hard to say without checking the code. If you can share a small code sample (either here or privately through email), I could try to investigate it. Depending on what we find, I could update the code in DSS_MATLAB, add additional helper functions in DSS C-API, or just propose you change some aspects of your code.

Sigrsteinn commented 4 years ago

Thank you for your reply. As a noob, I've never heard of MEX. If even you as the developer of this package say that MEX is cumbersome, I'll just avoid it for now. I don't really follow Matlab's development, but I think you're right about its COM implementations getting faster. I just checked that the document about early and late binding is indeed from 2015.

This is the part of my code that uses the package.

function [Pstat,Vstat]=runDSS(obj) %this function is called 100 times
    Volt=inf(obj.allc2(2),obj.countBus*3); %Preallocation
    Loss=0;
    for i=1:obj.allc1(1) %for i=1:22
        obj.DSSText.Command=char(obj.strCmd1(i)); %2200 times
    end
    for hr=1:obj.allc2(2) %for hr=1:24
        for k=1:obj.allc2(1) %for k=1:214
            obj.DSSText.Command=char(obj.strCmd2(k,hr)); %513,600 times
        end
        VmagPU=obj.DSSCircuit.AllBusVmagPu;
        Volt(hr,1:length(VmagPU))=VmagPU;
        Loss=Loss+(obj.DSSCircuit.Losses(1)/1000);
    end
    VV=reshape(Volt,1,[]);
    VV(VV==inf)=[];
    Vmin=min(VV(VV~=0));
    Vmax=max(VV);
    Vdev=sum(sum((VV-1).^2));
    Vstat=[Vmin Vmax Vdev];
    Pstat=[Loss obj.totalPV obj.totalLD];
end

obj.strCmd1 and obj.strCmd2 are double quote string arrays containing the commands (in DSS Scripting Language). I used these double quote strings because I need to do some vectorized string manipulation. It causes errors with single quote character vectors. The following two images show the result from the profiler. The first image shows the summary, the second image shows the details about IText.set.Commands. image image

As a comparison, the following images show the profiler using the actxserver method image image

I am using Matlab R2020a for the test. I tested it with R2020b as well with similar result.

PMeira commented 4 years ago

@Sigrsteinn Interesting, thanks.

It looks like the error checking is not the main culprit. Yet, since it is still a few percent and it's a trivial function call, it's worth making it better. Since my other comment, I noticed there is a faster alternative that I forgot to implement here, I'll try that later today or tomorrow.

On the main issue, you generate a long DSS script and pass it all line-by-line, right? I think that's a bit unconventional. In those DSS commands, do you create new DSS elements or just update them?

Independent of COM or DSS_MATLAB, have you tried writing those lines to a file and then running a ...Text.Command = 'compile file.dss' instead? I imagine it should be faster (remember to whitelist the folder where the file is created in your antivirus).

I'll try to run some tests that mimic your code -- load a large circuit in memory, feed it line by line to OpenDSS, loop that. It's still possible that your DSS script itself is hitting some slow code in DSS C-API, i.e., it might not be specific to the API calls or MATLAB. Testing should help make this clearer.

On the MEX vs. CallLib topic, this comment has some interesting data: https://github.com/CoolProp/CoolProp/issues/1095#issuecomment-224543225 (it's from 2016 though, take it with a grain of salt).

As a sidenote, someday we'll probably add a DSS.Text.Commands (plural) to consume a huge string. We probably have a ticket/comment about it in one of the repositories under dss-extensions.

Sigrsteinn commented 4 years ago

the code is about evaluating the effects of PV installations in different buses with different sizes. The common information about the circuit is already in a single script file, while the commands that I send line-by-line is just the part that changes every iteration. I also tried your suggestion of writing the lines into a .dss file and running that file.

I changed my code so there are two separate functions for the multiline DSS scripts. The first one here is my for-loop method

        function script1(obj,allc,fileName,trgt)
            for i=1:allc
                obj.DSSText.Command=char(trgt(i));
            end
        end

The second one shown below is where the script is written into a file first before being executed

        function script2(obj,allc,fileName,trgt)
            pathScr=strcat(pwd,fileName);
            fID=fopen(pathScr,'w');
            fprintf(fID,'%s\n',trgt);
            fclose(fID);
            obj.DSSText.Command=sprintf('compile (%s)',pathScr);
        end

To test the new method, I ran the the profiler for 4 codes:

  1. "testDSS1scr1": actxserver - for loop
  2. "testDSS1scr2": actxserver - write to file
  3. "testDSS2scr1": this package - for loop
  4. "testDSS2scr2": this package - write to file

At first, the result seem to show that writing the commands into a file made it slower, as shown below: image

After Google Drive sync is paused and the folder is whitelisted, writing the commands into a .dss script is indeed the faster method for this package: image

For the actxserver method though, sometimes the for loop method is faster. I don't know why that happens. image

PMeira commented 4 years ago

For the actxserver method though, sometimes the for loop method is faster. I don't know why that happens.

@Sigrsteinn It could be the profiler overhead or some background process.

I wrote a simple benchmark that loads (no solve or calcvoltagebases) the IEEE 8500 node circuit, here's what I got for 10 runs (it uses tic and toc to measure the total time).

times_com =

    1.3613    4.0668

times_capi =

    1.1041    5.1506    2.0569    1.0068

ratio_best =

    0.7396

My conclusion is that MATLAB has a lot of overhead for function calls, be them COM calls or plain C calls via CallLib. I'm now considering investing some time later to create a full implementation of DSS_MATLAB using MEX to reduce the overhead. Probably won't happen for a while though (at least some weeks).

My computer at the university still has MATLAB 2018a but I requested the latest installation available and will retry this simple benchmark when I'm able. I'm not sure the prebuilt MEX files would work on other versions of MATLAB, but I uploaded my test MEX files here if you want to give them a try: dss_capi_mex_test.zip. To test them, drop them inside the +DSS_MATLAB folder (which contains the dss_capi_v7.dll). Instead of DSSText.Command = cmd;, use DSS_MATLAB.dss_text_set_command(cmd). For the C loop version, DSS_MATLAB.dss_text_set_command(cmds). With a bit of luck, it could work, especially since very few MATLAB functions are used in the code.

PMeira commented 4 years ago

I decided to check most options to get a more clear overall of the speed of each.

On the table (running on MATLAB 2018a):

COM DirectDLL calllib DSS_MATLAB DSS_MATLAB vs COM DSS C-API calllib DSS C-API calllib vs COM DSS C-API MEX DSS C-API MEX vs COM
Run script in a single .DSS file 1.3636 1.2941 1.1537 85% 1.0884 80% - -
Run script line-by-line (from MATLAB) 4.0192 2.0579 5.1038 127% 1.8869 47% - -
Read dssCircuit.AllBusVolts 0.8856 - 0.1365 15% - - - -
Read dssLoads.Name (all loads, one by one) 0.2259 0.1255 0.1483 66% 0.1136 50% - -
Write dssLoads.Name 0.3632 0.3026 0.6242 172% 0.1063 29% - -
Iterate through loads 0.0761 0.0379 0.0508 67% 0.0373 49% - -
Read dssLoads.kW (all loads, one by one) 0.1732 0.0824 0.1714 99% 0.0791 46% - -
Set all dssLoads.kW 0.2119 0.0934 0.5830 275% 0.0864 41% - -
Set single dssLoads.kW 0.0078 0.0045 0.0414 531% 0.0041 53% - -
Set single dssLoads.kW (using MEX) - - - - - - 0.0028 36%
Run script line-by-line (using MEX in a MATLAB loop) - - - - - - 2.0627 51%
Run script line-by-line (using a C loop in MEX) - - - - - - 0.9910 25%

Some observations:

For the time being, my recommendation would be to use DSS_MATLAB, profile (like you already did, @Sigrsteinn), and try to use some calllib on the hot loops. E.g., provided you already instantiated DSS_MATLAB, you can replace

DSS.Text.Command = char(cmd);

with

calllib('dss_capi_v7', 'Text_Set_Command', char(cmd));

This will not have the extra error checking but should be the faster solution besides MEX. For simple calls (single char array, integer, or double arguments) this is feasible. For functions that return arrays/pointers, it becomes more cumbersome. Most functions names are straightforward and you can check them in the matching header file for the DSS C-API version, e.g. https://github.com/dss-extensions/dss_capi/blob/0.10.6/include/v7/dss_capi.h -- I'll make sure to include this in the future releases.

As I mentioned in my previous message, I'll redo this on the latest MATLAB whenever I get it, and in the future investigate MEX further -- the MEX C++ alternative in MATLAB 2020a looks much better than the C alternative. My plan would be to keep the current calllib implementation as it doesn't require the users to compile anything, but complement it with MEX alternatives if the users need faster code and have a compiler installed.

Sigrsteinn commented 4 years ago

I'm not sure the prebuilt MEX files would work on other versions of MATLAB, but I uploaded my test MEX files here if you want to give them a try: dss_capi_mex_test.zip.

I just the MEX files. The DSS_MATLAB.dss_text_set_command(cmd) method works, but as you said, it is rather slow. In my case, it's somehow slower than Matlab's actxserver. The DSS_MATLAB.dss_text_set_commands(cmds) method does not work for me. It causes an access violation that forces Matlab to close.

For the time being, my recommendation would be to use DSS_MATLAB, profile (like you already did, @Sigrsteinn), and try to use some calllib on the hot loops.

I also did this. It is faster than the line-by-line MEX some of the time, but the calllib method is still slower than actxserver on my machine.

  1. "testDSS1scr1": actxserver - for loop
  2. "testDSS1scr2": actxserver - write to file
  3. "testDSS2scr3": single-line MEX - Matlab for loop
  4. "testDSS2scr4": single-line MEX - write to file
  5. "testDSS2scr5": dss_capi calllib - for loop
  6. "testDSS2scr6": dss_capi calllib - write to file

image image image

I don't know what's wrong. Maybe there's something wrong with my PC, maybe I need to run more iterations to get more accurate result for each method. But for now, I need to focus on something else. I will probably try this again some time in the future.

Thank you for your help.

PMeira commented 4 years ago

@Sigrsteinn If your timings always come from the profiler, try running with the profiler disabled. With profile on:

Thus, when using the profiler, the timings are skewed a lot towards COM. A simple tic + toc should not affect the timings though -- use that instead and DSS_MATLAB should be faster.


Update (April 2023)

So, in the end, you'll probably end up with faster scripts in certain conditions, or slower if you hit the slower properties. It all depends on how you wrote the script in the first place. Looking for MATLAB code that uses OpenDSS, there are some very bad examples that instead of using the dedicated API functions, just use the Text interface, only to then post-process the Text.Result string into numeric values. For example, there is no need to use the Text interface to get the buses of all lines since you can use DSSObj.ActiveCircuit.Lines.Bus1 (and Bus2), or more generally DSSObj.ActiveCircuit.ActiveCktElement.BusNames.

Regardless of the interface being used, the general advice for any language is still valid: try to avoid strings when possible and you'll get results faster. For example, consider that strings are generally arrays of characters of varying sizes, so if you activate lines by name in a loop, you're copying all those strings around, including potential reencoding, etc., while if you activate lines by index, a single integer is being copied.

And again, regardless of the interface, it doesn't seem like the users quite understand how the classic OpenDSS API organization works. I guess that's something we can try to help though, by adding an overview document.

Per #13, we still want to add a MEX option, but it certainly is not because this package "runs slower than actxserver".