Support option to intentionally disable numba

SanPen / GridCal

GridCal, a cross-platform power systems software written in Python with user interface, used in academia and industry.

https://www.advancedgridinsights.com/gridcal

GNU Lesser General Public License v3.0

417 stars 94 forks source link

Support option to intentionally disable numba #141

Closed mzy2240 closed 2 years ago

mzy2240 commented 2 years ago

Hi @SanPen, I went through the source code but did not find a way to quickly disable numba in a numba-installed environment. I am asking this because 1) I want to compare the performance gain with/out numba 2) I am also trying to implement a new ds_dv function using Taichi to compare with the numba version. Please advise at your convenience. Thanks!

SanPen commented 2 years ago

Hi,

Whenever I want to perform such test I just copy the function twice, with the numba decorator and again with the name slightly changed and I call them from the parent function.

Here you'll find the derivatives used in the power flow routines. I'd say those are quite good already, matching or improving on comercial software speeds. The thing that I have not done is to use them in the state estimation. For that you'd need to compose the jacobian directly with numba using the CSC or CSR schemes. Here you'll find a preety radical jacobian assembly (9 x 9 submatrices)

In my experience the overhead of creating parallel threads is not worth it in the power flow at leats. With numba you can achieve parallelism using prange instead of range and setting parallel=True in the decorator. Also, in my experience the GPU is actually worse than the CPU unless you have a RTX3090 or the likes. GPU really is worth it for dense matrix manipulation, such as graphics or some ML problems.

mzy2240 commented 2 years ago

Yeah I totally agree. I remember there is a paper published last year from one of the pandapower developer and it is about a cuda-based pf solver. And from that paper it said the communication overhead (copy from and forth from cpu to gpu) is too huge to ignore.

Currently I am doing exactly the way you points out: manually copy the function and compare. But sightly different, I am acutally comparing the dSbus_dV function with dSbus_dV_numba_sparse_csr and just manually change the output of dSbus_dV to dSbus_dVa.data, dSbus_dVm.data. However, surprisingly I am still not able to compare, because the number of iteration does not match, though the final results are really close. Do you have idea what's going on? I am using the Bus5_GSO case and in case you are interested here is the file: Bus5_GSO.zip

mzy2240 commented 2 years ago

When using dSbus_dV_numba_sparse_csr function, it is executed 5 times, while 34 times if using dSbus_dV.

SanPen commented 2 years ago

Hi, Those two should be the same. You may convert both to dense and compare them.

In general in gridcal I try to do everything using the csc scheme. scipy does not case though.

If you have a piece of code I'll have a look.

When you see such variation on the iteration number is because of the jacobian.

Br, Santiago

mzy2240 commented 2 years ago

Yeah here is the code:

from GridCal.Engine.IO.power_world_parser import PowerWorldParser
import numba as nb
from GridCal.Engine import PowerFlowOptions, PowerFlowDriver, SolverType

parser = PowerWorldParser(r"C:\Users\test\PWCase\HW2_PWCases\HW2_Cases\Bus5_GSO\Bus5_GSO.EPC")
grid, logger = parser.parse_case()

options = PowerFlowOptions(SolverType.NR, verbose=False)
power_flow = PowerFlowDriver(grid, options)
power_flow.run()

print('\n\n', grid.name)
print('\t|V|:', abs(power_flow.results.voltage))
print('\t|Sbranch|:', abs(power_flow.results.Sf))
print('\t|loading|:', abs(power_flow.results.loading) * 100)
print('\terr:', power_flow.results.error)
print('\tConv:', power_flow.results.converged)

The default is actually using the dSbus_dV_numba_sparse_csr. To compare, simply replace the line 263 with dS_dVa, dS_dVm = deriv.dSbus_dV(Ybus, V). I added a print statement in both functions, and get the iteration number difference.

SanPen commented 2 years ago

I see,

There you're mixing two functions in the wrong place. dSbus_dV is not a drop-in replacement of dSbus_dV_numba_sparse_csr, Observe that the returning types are different. For me that change makes the program crash.

deriv.dSbus_dV(Ybus, V) is meant to be used eventually in other processes. If you want the complete jacobian formulated with those derivatives it is here, line 144.

You'll find that the newton raphson routine has both the fast and the regular jacobians (line 321 comented)

The faster numba jacobian is an order of magnitude faster than the one assembled using matrix products and later slicing.

mzy2240 commented 2 years ago

Thank you for the information. I could manage to make the dSbus_dV work somehow, and the results are really close. But anyway, since I am working on formulating the Jacobian for SE, do you think it is worthy to use numba to accelerate the computation process for dSbr_dV and dIbr_dV, and then change the way how it assembles?

SanPen commented 2 years ago

Hi,

It is definitelly worth it to have a numba related formulation of the branch derivatives, and then make a fast assembly of the SE jacobian.

In the derivatives.py file, there is dSf_dV_fast(Yf, V, Vc, E, F, Cf). This was an attempt at producing a fast derivative of the branch power.

If you study the numba accelerated version of the power injections, it is the same derivatives from matpower but split in a smart way, taking the most of possible vectorizations and minimizing memory reallocation. It is also important to exploit the existing structures; dSbus_dVhas the same structure as Ybus. dSf_dV has the same sparsity structure as Yf (I think, this should be tested) which is known.

Let me know if you need a hand with this.