Closed mzy2240 closed 2 years ago
What is your application? We focus on SPICE circuit simulations and there is no such requirement on multiple b vectors. And currently we only consider dense b (sparse b leads to some additional symbolic analysis overhead and the performance benifit is generally negligible).
Power system related computation, so the sparsity of A is pretty similar to those circuit applications. KLU could accept 2D dense matrix b. Basically it is the same, just repeat many many times for each column of b.
I see. Let me add the support of multiple b vectors. This may take some days.
Really appreciate!
added
Dumb question: will AVX512 make a difference on the performance of this library? The latest Intel CPU abandoned AVX512 instruction set.
The performance is usually bounded by bandwidth not computation. Plain C-->SSE2 improves 10-20%. SSE2-->AVX2 improves a few percentages. AVX2-->AVX512 almost has no effect.
For bandwidth you mean the data ports within the SIMD pipes (including shared data ports)?
Packing data for SIMD processing indeed causes some overhead. But in general, the overall performance is more bounded by DRAM bandwidth.
When b is actually a 2D matrix, the time cost for factorization can be neglectable (cause you only have to do once apparently). In this case, does cktso natively support b as a sparse or dense 2D matrix? If yes, is cktso solving the slices of b in parallel as well?