Open gkovacsds opened 1 year ago
You may check the following:
If the above two tests do not change the situation, please send me the mtx and rhs files and I will test them.
For the current version (20230325), oparm[27] (for row mode, i.e., Ax=b) and oparm[28] (for column mode, i.e., (A^T)x=b) reflect the solving algorithms. Their meaning is a little complex. They are two-digit numbers in decimalism. The least significant digit is the solving algorithm for L, and the other digit is for U. 0 means sequential and 1,2,3 mean the corresponding parallel algorithms. For example, if oparm[27]=20, it means the row mode solving uses sequential algorithm for Ly=b and uses the 2nd parallel algorithm for Ux=y.
Thank you for the answer, it took some time to check all I could. For some reason the benchmark program behaves differently than my test program, though I pass the number of threads to CKTSO_Analyze. It never uses parallel mode (oparm[27] is always 0). The add20 matrix file also solves not parallel. iparm is the same, oparm is only different in [27] (and in the benchmark times).
My Windows test program is not in Visual Studio (Delphi), and uses the DLL through the C interface. The Benchmark program uses the ICktSo interface - could this difference be the problem? Please advise how I could enable the use of parallel solving? Is it restricted on DLL only use? Do I have to use the C lib for it?
Also may I ask if CKTSO uses custom created threads or it is based on a ready-made solution like OpenMP?
Anyway I modified the benchmark program so it can read unsorted MTX files too. It may be useful for you too. It can also read more mtx files (with the same matrix structure), and solves them all repeatedly. Can also read RHS files, so I could also check solution validity compared with other solvers. CKTSO was solving well of course.
On this page you can find the modified benchmark VS project, if interested. I also collected the files documenting my test: CSR matrix dumps, Screenshot that 7 additional threads are spawn (for 8 threads run), but solving is then not parallel (oparm[27] is 0) Some logs with iparm, oparm and VS debug screens.
Thank you for any suggestions.
Thank you for providing the detailed information. I first provide some quick answers. After I try your cases, I will answer other questions. The C interface and C++-like interface behave exactly the same. However, linux and windows libraries may behave differently. Whether to use parallel solving depends on many factors and CKTSO has some automatic determination methods. I will provide more information later. CKTSO uses native API, i.e., pthread/Windows API, for thread management. No openmp is used. From your reply, it looks that the issues are focused on parallel solving? Do you have performance issues in parallel factorization or refactorization?
My main problem is, that checking oparm[27] after CKTSO_Solve, it is zero even with the example matrix add20.mtx I don't know why it chooses sequential solve when I pass the number of threads to CKTSO_Analyze. I use the 14th May DLL. With the benchmark cpp it is OK, but my test program is always 0 for oparm[27]
If you need my test program , I attached it. Maybe you can inspect the dll calls and how cktso.dll works with that. Just click on "Solve!". matrix solver test.zip
Dear Mr.Chen, I send 3 matrix files then, please look at the zip file attached here. I also attach the matrix structure drawings. We only have success with "20knodes...", it switches to parallel solving and it's faster. With the others the solver seems to choose sequential solving. The "NMOSx8..." example seems quite structured, still not possible to switch to parallel solving? The "TPS40140..." is a more realistic circuit matrix, quite dispersed though. Can we have manual control on which solver to choose, or will it always be automatic? I see speedup in matrix analysis when enabling multiple threads, but not in factorization or solving. mtx.zip
Thank you so much for providing your test cases. I have tried your cases. Here are some notes for the observations.
Dear Mr.Chen, we have tested your CKTSO matrix solver in our circuit simulation software and generally liked the performance. We tried it on AMD/Intel Windows platforms, with CSR formatted matrices (row-major). However we did not achieve multi-thread performance improvement over single-thread mode, only slowdown. We did transient simulations - many refactorization and solve calls. We think we are using the library as recommended in the user guide.
To check that we are doing everything properly, could you help in these:
Thank you and best regards, Gergely