cuihantao / cvxoptklu

Standalone KLU add-on for CVXOPT
Other
2 stars 0 forks source link

A must be a square sparse matrix #1

Closed SanPen closed 5 years ago

SanPen commented 5 years ago

Hi,

I am trying to use KLU from this library into the program I develop (GridCal) and when calling klu.linsolve(A, b) it tells me:

A must be a sparse square matrix

The thing is that A is a CSR square sparse matrix from scipy. Maybe it has to be of another format.

Could you please clarify?

cuihantao commented 5 years ago

Hi SanPen,

Thanks for the interest! I have been tracking your work for a while. It is great to hear from you!

The issue is because CVXOPT does not automatically construct a cvxopt.spmatrix from a scipy.sparse.csc_matrix (or csr_matrix).

For now, you would have to manually do the conversion, like

A = A.tocoo()
A_cvxopt = cvxopt.spmatrix(A.data, A.row, A.col, A.shape, 'd')

klu.solve(A_cvxopt, b)

This conversion is costly if it needs to be called at every step. It would be easier to just store the A matrix in coo_matrix (or incrementally construct it with dok_matrix) to avoid the cost for building the csc_matrix in Scipy.

I guess at this point, it would be a good time to send a feature request to CVXOPT asking to provide a Python API to construct spmatrix based on the internal storage (colptr, row, data).

SanPen commented 5 years ago

Hi,

Thanks a lot for the reply.

I made this function:

def klu_linsolve(A, b):
    """
    KLU wrapper function for linear system solve A x = b
    :param A: System matrix
    :param b: right hand side
    :return: solution
    """
    A2 = A.tocoo()
    A_cvxopt = cvxopt.spmatrix(A2.data, A2.row, A2.col, A2.shape, 'd')
    x = cvxopt.matrix(b)
    klu.linsolve(A_cvxopt, x)
    return np.array(x)[:, 0]

Then I did a bit of bench-marking: I run a 1-year time series with IEEE39 and I got the following:

+---------+------------------+------------------+------------------+------------------+------------------+
|         | KLU              | BLAS/LAPACK      | ILU              | SuperLU          | Pardiso          |
+=========+==================+==================+==================+==================+==================+
| Test 1  | 82.0306398868561 | 82.1049809455872 | 81.7956540584564 | 82.8895554542542 | 93.2362771034241 |
| Test 2  | 80.2231616973877 | 80.8419146537781 | 81.7140426635742 | 81.3713464736938 | 95.2913007736206 |
| Test 3  | 79.5343339443207 | 82.3211221694946 | 82.7529213428497 | 80.9804055690765 | 92.6268711090088 |
| Test 4  | 80.0667154788971 | 82.6606991291046 | 82.1418635845184 | 80.178496837616  | 97.6009163856506 |
| Test 5  | 80.0720291137695 | 80.5129723548889 | 81.9473338127136 | 80.0363531112671 | 93.3938195705414 |
+---------+------------------+------------------+------------------+------------------+------------------+
| Average | 80.3853760242462 | 81.6883378505707 | 82.0703630924225 | 81.0912314891815 | 94.4298369884491 |
+---------+------------------+------------------+------------------+------------------+------------------+

So despite the repackaging of the matrix and everything KLU did pretty well.

cuihantao commented 5 years ago

Awesome! I will include your code and update the README.

It is great to have a benchmark for sparse libraries. Some reference data are available in Table 1 of this paper: http://faraday1.ucd.ie/archive/papers/vancouver.pdf

Not sure how you set up the benchmark though. Your data has pretty close average time across all the libraries - this might be due to an overwhelming overhead. You might want to look into it further.

cuihantao commented 5 years ago

2a2f015 adds FAQ to the README.

SanPen commented 5 years ago

Hi,

Indeed, this benchmark is not about the raw libraries, but about the libraries within the process.

The complexity of handling multiple islands, the outer-loop controls and the results gathering is there and I don't think I can optimise very much in python since all that can use numpy vectorization uses it already.

Out of curiosity I made the following benchmark. Look at the bottom of the file. I use both a regular python 3.7 distro and Intel's python distro. Both under ubuntu.

The results are these:

  Total Average
Blas/Lapack (MKL) 4.36154651641846 0.087230930328369
SuperLU (MKL) 4.10816597938538 0.082163319587708
ILU (MKL) 2.79792213439941 0.055958442687988
Pardiso 1.6205837726593 0.032411675453186
Blas/Lapack (no MKL) 10.0897464752197 0.201794929504394
KLU 4.38721060752869 0.087744212150574
SuperLU (no MKL) 9.94155168533325 0.198831033706665
ILU (no MKL) 2.94857239723206 0.058971447944641

Now the question is, do these solver perform better because of the conditioning of the jacobian matrix? The Pardiso results make no sense so far.

cuihantao commented 5 years ago

Hi SanPen,

I have no clue about why the solvers differ in computation time. Would like to hear when you know more.

I'm gonna close the Issue for the moment.

SanPen commented 5 years ago

Just to comment about the issue; I profiled the time series functionality and the bottleneck appears to be in the scipy library itself. There is an awful amount of calls to safety checks that amounts to a 20% of the run time. I already raised an issue at the scipy project.

cuihantao commented 5 years ago

Thanks for the follow up! @SanPen I also noticed Scipy's check taking an overly long time, so I avoided Scipy.sparse for constructing sparse matrices. I've subscribed to you issue report in the Scipy repository.