How to get ||A^{-1}||_1?

NAThompson commented 5 years ago

I only see a way to estimate ||A||_1 in the README, but I can do that by maxing over L1 row sums in O(n^2) ops. The hard part appears to be estimation of ||A^-1||_1; how do I use your package to get ||A^-1||_1 given only A?

SuperFluffy commented 5 years ago

Thank you for your interest in the package. I did some digging and realize that the package name is pretty confusing.

When writing this crate I only needed and hence implemented the one-norm estimator by Higham and Tisseur, and I believed that the condition number estimator should follow from this right away. At the moment I am not sure how I came to this conclusion.

So in order to actually estimate the condition number, some more work needs to be done. If I read Higham and Tisseur 2000 correctly, one can use Higham's 1988 modification of Hager 1984 to estimate the one-norm of A^-1.

That's actually what Matlab and Octave are doing, see this line in Octave's implementation.

They are:

1) explicitly calculating ||A||_1 if A is a matrix, or estimating it if otherwise; 2) performing a LU-decomposition of A with pivoting; 3) estimate ||A^-1||_1 using the factors obtained from the LU decomposition; 4) finally calculating the condition number k = ||A||_1 ||A^-1||_1.

Step 3 then makes use of the fact that we never have to explicitly calculate the matrix product. Pretty much what Normest1::normest1_prod is for.

Right now I don't have the time to look into implementing this, although I'd love to sort this out.

So if you want to take a shot at it, I'd be happy to accept and review a pull request. Otherwise it might take a few months before I get to it.

NAThompson commented 5 years ago

By the way, I asked about this on scicomp.stackexchange, and received a good answer:

https://scicomp.stackexchange.com/questions/32602/missing-something-fundamental-about-condition-number-estimation

I don't think you'll be able to make this package actually estimate condition numbers in an efficient manner. This requires that the triangular decomposition is available (if you do the triangular decomposition yourself, then it's inefficient). But then you have to have some knowledge of what the decomposition is in order to compute the matrix vector products $A^-1 v$ and $A^{-T}v$. In Eigen, this is achieved via having a consistent syntax for every solver, namely they can write solver.solve(v) whether A = LU or A = QR or whatever.

SuperFluffy commented 5 years ago

Cool, thanks for looking into this.

You seem to be more knowledgeable about this than me. :) Can you explain why you think that this decomposition is inefficient? I guess you mean that taking the inverse of a matrix is O(n^3) while a LU factorization is also O(n^3) (AFAIR)? Yeah, that seems like it doesn't gain us anything. :(

Can you point me to the relevant parts of eigen to see how they do it?

So assume we have e.g. A=LU available, then we could take a similar path as is done normest1_prod and simply represent it as a collection of matrices. I guess if one were so inclined, one could provide a function that takes the slow path (calculating the decomposition itself), and a fast path, where a user supplies a decomposition. I guess that's what matlab and octave are doing.

Can I ask what your usecase for this stuff is?

NAThompson commented 5 years ago

Check out Higham, Accuracy and Stability of Numerical Algorithms, problem 7.2. Suppose we wish to solve Ax=b numerically, obtaining \hat{x}. Let r := A\hat{x} -b. Then we have the bounds

||r||/||A|| <= ||x - \hat{x}|| <= cond(A)||r||/||A||

So if I have a cheap way to get the condition number of A I can get both upper and lower bounds on the accuracy, even in real time, and if k(A) is too big I can upcast to higher precision. Jack Dongarra uses this workflow to detect bugs in supercomputing hardware. I can use it to get solvers working robustly in radiation heavy environments where I might get uncorrectable bit flips.

Here's a code implementing the workflow I described (without any attempts to handle the problem if it's detected):

#include <iostream>
#include <iomanip>
#include <Eigen/Dense>

int main(int argc, char** argv)
{
    if (argc != 2) {
        std::cout << "Usage: ./a.out n, where n is the matrix size.\n";
        return 1;
    }
    using Real = double;
    std::cout << std::setprecision(std::numeric_limits<Real>::digits10);
    long n = atoi(argv[1]);
    Eigen::Matrix<Real, Eigen::Dynamic, Eigen::Dynamic> A(n,n);
    Eigen::Vector<Real, Eigen::Dynamic> b = Eigen::Vector<Real, Eigen::Dynamic>::Random(n);

    // Initialize the Hilbert matrix:
    for(long i = 0; i < n; ++i) {
        for (long j = 0; j < n; ++j) {
            A(i,j) = Real(1)/Real(i+j+1);
        }
    }

    // Compute the L1 operator norm of the Hilbert matrix:
    Real l1_A =  A.colwise().lpNorm<1>().maxCoeff();
    std::cout << "\u2016A\u2016\u2081 = " << l1_A << "\n";

    // Perform an LU factorization on A: 
    Eigen::PartialPivLU<decltype(A)> LUSolver(A);

    // Solve Ax = b
    auto x = LUSolver.solve(b);

    // Compute the l1 norm of solution to establish a scale for the error:
    Real l1_x = x.lpNorm<1>();

    // Estimate the condition number of A:
    Real cond1 = 1/LUSolver.rcond();
    std::cout << "Estimate of \u03BA\u2081(A) = " << cond1 << "\n";

    // Compute the residual and its norm:
    Eigen::Vector<Real, Eigen::Dynamic> r = A*x - b;
    auto l1_r = r.lpNorm<1>();
    std::cout << "1-norm of residual = " << l1_r << "\n";

    // Error bounds:
    Real error_lower_bound = l1_r/(l1_x*l1_A);
    Real error_upper_bound = cond1*error_lower_bound;
    std::cout << "Relative error is between " <<  error_lower_bound << " and " << error_upper_bound << "\n";
}

The main point is that the condition number estimation is built into the decomposition algorithm, it does not exist independently of it.

The reason why Octave gets away with performing the LU decomposition internally is that there is no expectation it will be fast, it simply is a way for users to understand their matrices. For a realtime scientific system, performing a triangular decomposition twice is catastrophic.

SuperFluffy commented 5 years ago

That's a really cool usage of this method. I was wondering what people used it for. Thanks for the explanation.

How important is it for you to have it available in Rust? Are you moving your code base from C++? It would be pretty cool for Rust's scientific computing story to have this kind of stuff available and actively used by people.

I'd love to implement this somewhere down the road, but I am a bit time constrained with my PhD research right now. Do you want to work on this together and have a chat about it? I'd love to learn more about what kind of problem you are trying to solve with this.

NAThompson commented 5 years ago

When I opened this issue, I didn't know that Eigen had this functionality, so I was gonna look through your implementation to see how I implement it myself. So now it's less important to me.

I don't use Rust for numerics, and I won't be moving any code from C++.

SuperFluffy / rust-condest

How to get ||A^{-1}||_1? #6