chenxm1986 / cktso

Pursuing the best performance of linear solver in circuit simulation
29 stars 4 forks source link

Native support when b is a 2D matrix #2

Closed mzy2240 closed 2 years ago

mzy2240 commented 2 years ago

When b is actually a 2D matrix, the time cost for factorization can be neglectable (cause you only have to do once apparently). In this case, does cktso natively support b as a sparse or dense 2D matrix? If yes, is cktso solving the slices of b in parallel as well?

chenxm1986 commented 2 years ago

What is your application? We focus on SPICE circuit simulations and there is no such requirement on multiple b vectors. And currently we only consider dense b (sparse b leads to some additional symbolic analysis overhead and the performance benifit is generally negligible).

mzy2240 commented 2 years ago

Power system related computation, so the sparsity of A is pretty similar to those circuit applications. KLU could accept 2D dense matrix b. Basically it is the same, just repeat many many times for each column of b.

chenxm1986 commented 2 years ago

I see. Let me add the support of multiple b vectors. This may take some days.

mzy2240 commented 2 years ago

Really appreciate!

chenxm1986 commented 2 years ago

added

mzy2240 commented 2 years ago

Dumb question: will AVX512 make a difference on the performance of this library? The latest Intel CPU abandoned AVX512 instruction set.

chenxm1986 commented 2 years ago

The performance is usually bounded by bandwidth not computation. Plain C-->SSE2 improves 10-20%. SSE2-->AVX2 improves a few percentages. AVX2-->AVX512 almost has no effect.

mzy2240 commented 2 years ago

For bandwidth you mean the data ports within the SIMD pipes (including shared data ports)?

chenxm1986 commented 2 years ago

Packing data for SIMD processing indeed causes some overhead. But in general, the overall performance is more bounded by DRAM bandwidth.