DrTimothyAldenDavis / SuiteSparse

The official SuiteSparse library: a suite of sparse matrix algorithms authored or co-authored by Tim Davis, Texas A&M University.
https://people.engr.tamu.edu/davis/suitesparse.html
Other
1.18k stars 264 forks source link

Peak Memory allocated reporting function in KLU? #883

Open DimaLPorNLP opened 3 weeks ago

DimaLPorNLP commented 3 weeks ago

Feature I am interested to monitor the peak memory in bytes allocated in KLU. I see there is an entry in the Common structure Common.mempeak. It seems no easy to create an interface to the Common structure from other languages. But interfaces to functions are much easier.

Solution So, a function call that reports back Common.mempeak, the allocated memory in KLU, would be nice to have. Is there such a function already which I have missed? Is it easy to implement?

Comments

  1. Using the BTF ordering is there any rule of thumb on how many more nonzeros as a function of the matrix nonzeros KLU allocates internally in the worst possible case?
DrTimothyAldenDavis commented 2 weeks ago

I don't have such a function but it would be simple to add. I can add it to my TODO list.

For the BTF ordering: no, there is no good rule of thumb. All sparse matrix orderings (BTF, AMD, COLAMD, etc), and all sparse pivoting methods (KLU's partial pivoting, etc), are heuristics for solving the NP-hard fill-minimization problem. They can often do well, but sometimes a single bad pivot choice can cause catastrophic fillin. This is the case for all sparse solvers, not just mine.

Sparse LU is more challenging in this regard than sparse Cholesky, since sparse LU does symbolic fill-in reducing orderings plus numerical pivoting. Cholesky and QR just do symbolic fillin orderings.

DimaLPorNLP commented 2 weeks ago

Dear prof. Davis, thank you very much for this detailed explanation.

One more question that troubled me. I have a problem where one line of my matrix has 200 or 300 nonzeros. Most rows have about 6 nonzeros on average. Without this single line, the code runs in about 1.3 seconds. But when I add this single line the code runs in 4.3 seconds. Is it possible that something worsens the heuristics for solving the NP-hard fill-minimization (is this the minimum degree?). In general are "dense" lines even if too few a problem for the NP-hard fill-minimization? Which ordering is best in these cases? (AMD, COLAMD, etc)? Thanks in advance.

DrTimothyAldenDavis commented 2 weeks ago

That single row can cause problems, particularly if it has large entries.

Is this for LU? If so, then I use a fill-reducing column ordering (or symmetric ordering if the pattern is mostly symmetric). Then during factorization in UMPACK, KLU, or cs_lu in CSparse, I use relaxed partial pivoting, where I try to pick a sparse pivot row. If I use a symmetric ordering, I try to pick the diagonal.

However, if the only numerically acceptable pivot is that one dense row, I have to pick it, regardless of sparsity. That can cause fill-in to explode since now you have lots of dense rows. It can cascade.

With relaxed pivoting, I use a tolerance, like 0.1. I find the max abs value in a pivot column, and then pick the sparsest row whos candidate pivot value is at least 0.1 times the largest, in value.

It can help if you can pre-scale that dense row to make the values small, if that doesn't disturb the conditioning of the matrix.

It's hard to prescale perfectly automatically. It's much easier to do problem-specific scaling. I do have prescaling to try to help this issue but the best scaling is dependent on the problem.

DimaLPorNLP commented 2 weeks ago

Yes it is KLU, the entries are percentages < 1. Thank you I will try out the options you suggested.