cornellius-gp / linear_operator

A LinearOperator implementation to wrap the numerical nuts and bolts of GPyTorch
MIT License
95 stars 28 forks source link

[Bug] linear_cg fails on sparse matrices #58

Open simonseo opened 1 year ago

simonseo commented 1 year ago

🐛 Bug

linear_cg function runs fails to produce correct values when operated on sparse matrices.

To reproduce

add the following code snippet to test_linear_cg.py to reproduce

    def test_cg_with_sparse(self):
        # Create a sample sparse CSR tensor
        N = size = 1000
        nnz = 10000
        indices = torch.randint(0, N, (2, nnz), dtype=torch.int32)
        values = torch.randn(nnz, dtype=torch.float64)
        matrix = torch\
            .sparse_coo_tensor(indices, values, (N, N), dtype=torch.float64)\
            .to_sparse_csr()

        # set up vector rhs
        rhs = torch.randn(size, dtype=torch.float64)

        # basic solve
        solves = linear_cg(matrix.matmul, rhs=rhs, max_iter=size)

        # solve with init value
        init = torch.randn(size, dtype=torch.float64)
        solves_with_init = linear_cg(matrix.matmul, rhs=rhs, max_iter=size, initial_guess=init)

        # Check cg
        matrix, rhs = matrix.to_dense(), rhs.to_dense()
        actual = torch.linalg.solve(matrix, rhs)
        self.assertTrue(torch.allclose(solves, actual, atol=1e-3, rtol=1e-4))
        self.assertTrue(torch.allclose(solves_with_init, actual, atol=1e-3, rtol=1e-4))

Stack trace/error message

NumericalWarning: CG terminated in 1000 iterations with average residual norm 97.1439134614844 which is larger than the tolerance of 1 specified by linear_operator.settings.cg_tolerance. If performance is affected, consider raising the maximum number of CG iterations by running code in a linear_operator.settings.max_cg_iterations(value) context.

FAIL: test_cg_with_sparse (__main__.TestLinearCG)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/path/test_linear_cg.py", line 90, in test_cg_with_sparse
    self.assertTrue(torch.allclose(solves, actual, atol=1e-3, rtol=1e-4))
AssertionError: False is not true

----------------------------------------------------------------------

Expected Behavior

Expect the result of running linear_cg to be close to that of torch.linalg.solve.

System information

Please complete the following information:

Additional context

Other warnings produced when running the test:

  warnings.warn(
.../path/test_linear_cg.py:115: UserWarning: Sparse CSR tensor support is in beta state. If you miss a functionality in the sparse tensor support, please submit a feature request to https://github.com/pytorch/pytorch/issues. (Triggered internally at ../aten/src/ATen/SparseCsrTensorImpl.cpp:54.)
  .to_sparse_csr()
/path/lib/python3.10/site-packages/linear_operator/utils/linear_cg.py:323: NumericalWarning: CG terminated in 1000 iterations with average residual norm 97.1439134614844 which is larger than the tolerance of 1 specified by linear_operator.settings.cg_tolerance. If performance is affected, consider raising the maximum number of CG iterations by running code in a linear_operator.settings.max_cg_iterations(value) context.

I suppose this may also be a PyTorch issue but I am not sure how to investigate further from here. Appreciate your time to review this issue. Thanks!

gpleiss commented 1 year ago

It doesn't look like your matrix is positive semidefinite, so CG will fail in that case.