Simple calculation question

PyLops / pylops

PyLops – A Linear-Operator Library for Python

https://pylops.readthedocs.io

GNU Lesser General Public License v3.0

429 stars 102 forks source link

Simple calculation question #162

Closed Jammy2211 closed 3 years ago

Jammy2211 commented 4 years ago

Using the SciPy lstsq method, I can compute a solution to the problem Ax = b:

from scipy.linalg import lstsq
import numpy as np

"""Solving the system Ax = b:"""

A = np.array([[2.0, 0.0], [-1.0, 1.0], [0.0, 2.0]])
b = np.array([1.0, 0.0, -1.0])

x = lstsq(a=A, b=b)

"""Gives the correct solution, x = (1/3, -1.3)"""
print(x)

However, I have been unable to reproduce this calculation in PyLops. All of the tutorials I've found deal with scenarions where the matrix A is 1D, or it has the same dimensions as the data. Copious trial and error has so far filed to get me out of a dimension mismatch. Here is the sort of solution I'm trying:

import pylops

"""Now want to repeat this calculation using PyLops"""
LRop = pylops.Regression(A, order=1, dtype='float64')
print(LRop.shape)
solver = pylops.NormalEquationsInversion(Op=LRop, Regs=None, data=b)
print(solver)

I have no doubt I'm being very stupid, but if you could give me an example to get my simple problem up and running (or explains what I'm conceptually not grasping). I reckon I'll be able to make progress with PyLops from here.

mrava87 commented 4 years ago

Hi, thanks for your issue.

I'll give you a working solution and then try to highlights a few points which will hopefully help to make it more clear how PyLops can be used for this sort of problems :)

Solution that gives the same result:

import pylops

"""Now want to repeat this calculation using PyLops"""
LRop = pylops.MatrixMult(A, dtype='float64')
print(LRop.shape)
x = pylops.NormalEquationsInversion(Op=LRop, Regs=None, data=b)
print(x)

Let me start by saying that your example is not the main usage pattern of PyLops as your matrix is dense by nature and there is no better way (or at least not for this specific example) to perform the forward and adjoint steps then just apply to dot product of the matrix to a vector in the model or data space. But as you can see, you can still use PyLops to solve your problems by simply wrapping your dense matrix into the MatrixMult operator. This is a very thin wrapper which doesn't make you lose anything in terms of performance and gives you access the our suite of solvers (which has quite a few interesting options especially if you are interested in using L1 norms). Perhaps not super appeling if all your problems deal with dense matrices that cannot be applied more efficiently than just by a np.dot but I believe this is never the case from experience.
The main idea of PyLops is that of being able to solve large inverse problems without having to rely on the explicit knowledge of your matrix. This is done using the concept of linear operators and the fact that iterative solvers require only you to be able to apply the operator to a vector in the model space and its adjoint to a vector in the data space. This comes with the price that we cannot use direct solvers as they require the knowledge of the matrix. I am not very familiar with lstsq but I suspect it uses a direct method - most of scipy solvers under scipy.linalg are direct, whilst iterative solvers (which we also use under the hood) are in scipy.sparse.linalg. I guess you don't need to use lstsq but be aware that in PyLops only iterative solvers are used.
When you say I've found deal with scenarions where the matrix A is 1D I am not entirely sure what you mean but I suspect that you refer to the x and y vectors used to form and solve any inverse problem of the form y=Ax. In PyLops we always solve for 1D vectors (even if x is for example a 2d image we always see it as a long vector where the image has been unwrapped row by row and the operators handle this fact internally), but the matrix A itself (e.g. LRop in your example) is always a representation of a 2d matrix that has the number of rows equal the number of elements in x and columns equal the number of elements in y. You see that when you print the operator or its shape as you did.
I am not sure why you choose to use the Regression operator, as your problem is a standard matrix-vector=vector with a dense matrix... maybe you can add more on what was your idea for using it :) This operator is mostly meant for educational purposes as to show that how you can use linear operators to solve a linear regression problem y=ax+b (or a polynomial one). You can write the operator as an explicit matrix but as we discuss here https://pylops.readthedocs.io/en/latest/api/generated/pylops.Regression.html#pylops.Regression, effectively you could just calculate y as sum weighted versions of x^n n=0,1,2,...N with the weights being your coefficients (and similar for the adjoint)... in practice I dont believe this is going to be any faster than making a dense matrix but it is a simple example that we use to show how many problems that are generally cast as matrix could be represented by their equivalent operators with some save in computation cost and/or memory. More realistic cases when you want to do that are for example FFT operators or Wavelet operators or domain specific ones like Radon transform for medical applications, and probably some in your field too :)

Hope this makes sense and if you have any more complicated examples that you think PyLops could be used but can't figure out how just let me know, happy to help!

Jammy2211 commented 4 years ago

Thank you for the speedy reply! In response to some of your points :):

My use of the LinearRegression operator was idiotic and not reflective of my use-case at all!
Expressing the problem as a Matrix multiplication makes a lot of sense. As you said, PyLops expresses the problem in a very different way to how I've previously thought about it so I'm having too adjust my way of thinking!

Whilst I've got the opportunity, I'd really appreciate feedback on whether my use case is suitable to PyLops! I'm an astronomer working in the field of gravitational lensing, where basically we have telescope imaging of a galaxy (left panel) that we want to reconstruct on a pixel-grid (right panel):

For every pixel on our source pixel-grid (right panel), we have a set of mappings between it and a set of image-pixels (left panel, the red / green dots show examples of the mappings). Thus, our A matrix is dimensions [image_pixels, source_pixels], with non-zero entries whereever a source pixel maps to an image pixel (the red / green dots in the figure). This makes for a very sparse matrix, phew! For this simple use-case, I am going to use the example you gave above to solve for the source pixel fluxes via the MatrixMul operator.

However, for our actual science problems we have to perform one of the following two operations on A before solving for b:

Perform a 2D convolution on the 'image' of every 'source pixel', that is blurring each set of red / green dots in the image-plane with a kernel. This amounts to performing 2D convolution on every column of the A matrix, turning it from a set of 0's and 1's to a set of 2D convolved values (the kernel size is such that this matrix is still mostly sparse). My understanding is that to do this in PyLops we can link the Convolve2D operator to the MatrixMul example you gave above, still benefiting from PyLops's approach using an iterative solver.
Perform a non-uniform FFT operator on A which makes it fully dense and change its shape to ~[1e10, 1e3]! This essentially cripples our memory and is leading to horrific run-times. My understanding (hope!) is that provided I write the NUFFT as a PyLops operator (in an analogous way to Convolve2D) we can solve the system of linear equations via a least square fit that avoids the large memory requirements, giving us a healthy run-time boost.

We also regularize the source-pixels when performing the reconstruction, from what I can tell I just input our usual regularization matrix into the PyLops Least-Squares solver.

mrava87 commented 4 years ago

Hi @Jammy2211, this looks a really cool problem and indeed I think it suits well with PyLops philosophy :)

For the 2D convolution as you say you can indeed use the pylops.signalprocessing.Convolve2D operator. Provided that your convolution is stationary (the kernel is always the same, not entirely sure if this is the case for you looking at the plot?) then this operator will move efficiently compute the 2d convolution between your matrix A and the kernel. Note that A will be effectively passed as a long vector but the dims parameter can be used to tell the operator how to reshape it internally.

For the Non-uniform FFT you are also correct. Provided you have a piece of code that can apply the forward non-uniform FFT to a vector (or matrix if your FFT is effectively 2D) as well as a code that can perform the adjoint (same as inverse for FFT) then you can indeed wrap it in a PyLops operator like this one (you can have a look at the uniform case to get an idea https://pylops.readthedocs.io/en/latest/api/generated/pylops.signalprocessing.FFT2D.html) and your memory footprint reduces to only what the FFT representation of A is in the Fourier domain (but nothing for the matrix/operator applying such transformation). Obviously I do not know you problem in details but in our field another way to approach this problem is to actually perform a uniform FFT and then resample it only at places where you have recordings using the Restriction operator. Here is an medical imaging example https://pylops.readthedocs.io/en/latest/gallery/plot_tvreg.html#sphx-glr-gallery-plot-tvreg-py to give you an idea

Provided you can make these operators, then you can simply chain them with * (like Op = FFTOp * Convop) and then use any of our solvers to help you with any type of regularisation you want to add.

Have a go at it and keep me posted if you get stuck, hopefully I can help :)

Jammy2211 commented 4 years ago

So far, so good! Slow but incremental progress as I wrap my read around operators. I've managed to get the Convolve2D operator working on some simple test cases and am now exploring the NUFFT (which is really the main use-case we need PyLops).

Here is a simple example I got working, which I'll reference briefly below:

y = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0])

A = np.array(
    [
        [1, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 1],
    ]
)

Aop = pylops.MatrixMult(A, dtype="float64")
Cop = pylops.signalprocessing.Convolve2D(
    N=9, h=np.array([[0.5, 0.5]]), dims=[3, 3], method="direct"
)
Op = Aop * Cop
x = pylops.NormalEquationsInversion(Op=Op, Regs=None, data=y)

Help with the following questions would be greatly appreciate :)

In the above example and figure I attatched in a previous comment, my matrix A maps image-pixels to source-pixels, where both grids are 2D. However, we typically 'mask' our images to remove unwanted flux which has 0 S/N:

This means that one dimension of our A matrix no longer corresponds to a 2D rectangular grid. For the convolution and NUFFT we currently use, both can be applied to this masked data in 1D, circumventing the need to map the data from 1D to 2D. To be explicit, in both cases we are applying a 2D convolution or NUFFT to the mapped 'images', but using tools that allow us to retain the 1D structure of the masked data in memory.

I could write new LinearOperators which use our tools to perform these operations on masked arrays in 1D, which when we using matrices for our linear algebra proved a lot faster.

However, I notice the Convolve / FFT operators require 2D inputs, for example the 'dims' parameter of Convolve2D above which tells the operator that each column of A corresponds to a 3x3 image in 2D. Is the 2D nature of these objects 'baked in' the linear algebra in a way that would be very difficult for us to circumvent? My first attempt to set up an Operator on a non 2D grid lead to errors being raised by the linear algebra libraries.

If not, this isn't too problematic, as if things must be 2D I can just set up A to correspond to a 2D grid, but have it such that all rows that correspond to masked image pixels will be entirely 0's (which I guess the sparse algebra library will optimize anyway).

I'm having some issue with understanding the dimensions of the operator. In the code below, I set up an A matrix of shape [9,9] (where the 2D image and source grids are both shape (3,3). I can pass it 9 complex data points, successfully using the FFT2D operator to solve for the solution of the matrix multiplication including the FFT:

y = np.array([1.0 + 1j, 2.0 + 2j, 3.0 + 3j, 4.0 + 4j, 5.0 + 5j, 6.0 + 6j, 7.0 + 7j, 8.0 + 8j, 9.0 + 9j])

A = np.array(
    [
        [1, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 1],
    ]
)

Aop = pylops.MatrixMult(A, dtype="complex64")
FFTop = pylops.signalprocessing.FFT2D(dims=(3, 3))
Op = Aop * FFTop
x = pylops.NormalEquationsInversion(Op=Op, Regs=None, data=y)
print(x)

However, in our use case the dimensions of A do not correspond to the number of complex data-points we want to use it to solve for. This is because the FT changes the dimension of A, for example transforming the columns of A from dimension 9 to dimension 1e10. I've not had any lucky getting my FFT operator, combined with my MatrixMul operator, to work when the input number of visibilities is not 9:

y = np.array([1.0 + 1j, 2.0 + 2j, 3.0 + 3j, 4.0 + 4j, 5.0 + 5j, 6.0 + 6j])

A = np.array(
    [
        [1, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 1],
    ]
)

Aop = pylops.MatrixMult(A, dtype="complex64")
FFTop = pylops.signalprocessing.FFT2D(dims=(3, 3))
Op = Aop * FFTop
x = pylops.NormalEquationsInversion(Op=Op, Regs=None, data=y)
print(x)

This gives a 'dimensions mismatch' error. I have tried to change the dirs and nffts input parameters (which indeed change the dimensions of the operators) but so far have been unable to do so in a way that gives me a successful matrix inversion. Again, I've no doubt I'm just being fairly stupid but any help would be appreciated!

p.s.

Thanks for the top on using the FFT + Restriction method, when using the matrix formalism we actually have methods that perform the calculation exactly like that and using the NUFFT. In general, we found the NUFFT to be much faster and more accurate, but I suspect we'll end up implementing both in PyLops!

mrava87 commented 4 years ago

So far, so good! Slow but incremental progress as I wrap my read around operators. I've managed to get the Convolve2D operator working on some simple test cases and am now exploring the NUFFT (which is really the main use-case we need PyLops).

Nice! I try to answer your questions one by one but I am afraid in some cases you will find some questions back (whenever I am not sure I understand what you try to do ;))

First let me see if I understand your problem from a mathematical point of view as I think I do but the more I read I think I don't ;).

You have an image (right plot in previous comment) which I would try to call model or x from now to avoid confusion as this is what you want to invert for, right? Then you have something you can effectively record which is the left plot, that I am going to call data or y. The two are linked by either a 2d convolution with a 2d compact kernel or by a 2d non-uniform FFT. In either case you may not want to apply the convolution to every point in the model as you are effectively only interested in using data points within the black 'circle' in your latest plot?

What I am not sure I understand if how this compares/relates to the 2d non-uniform FFT. Is it just another way to model your recording and is it non-uniform because your model is effectively sampled at the black dots in your right plot... and what are the red points (one inside each square of the grid)?

Here is a simple example I got working, which I'll reference briefly below:

y = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0])

A = np.array(
    [
        [1, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 1],
    ]
)

Aop = pylops.MatrixMult(A, dtype="float64")
Cop = pylops.signalprocessing.Convolve2D(
    N=9, h=np.array([[0.5, 0.5]]), dims=[3, 3], method="direct"
)
Op = Aop * Cop
x = pylops.NormalEquationsInversion(Op=Op, Regs=None, data=y)

This makes sense with your first definition of the problem. Just to be sure you are using a 2x1 filter that is strided across a 2d image, so effectively you only filter along the first direction. Is that what you want to model? And you have created an identity matrix/operator Aop. Am I correct if I interpret this example as a first step where you assume your data is just a blurred version of your model with 1 to 1 mapping of pixels? But you could have A to have less rows than columns and also not always at the diagonal if you wanted to have a more realistic sampling like you explained for your real scientific problems?

PS: In that case i would suggest to have A as a sparse scipy matrix (MatrixMult allows that) or even explore if you can use the Restriction operator.

Help with the following questions would be greatly appreciate :)

In the above example and figure I attatched in a previous comment, my matrix A maps image-pixels to source-pixels, where both grids are 2D. However, we typically 'mask' our images to remove unwanted flux which has 0 S/N:

This means that one dimension of our A matrix no longer corresponds to a 2D rectangular grid. For the convolution and NUFFT we currently use, both can be applied to this masked data in 1D, circumventing the need to map the data from 1D to 2D. To be explicit, in both cases we are applying a 2D convolution or NUFFT to the mapped 'images', but using tools that allow us to retain the 1D structure of the masked data in memory.

I could write new LinearOperators which use our tools to perform these operations on masked arrays in 1D, which when we using matrices for our linear algebra proved a lot faster.

However, I notice the Convolve / FFT operators require 2D inputs, for example the 'dims' parameter of Convolve2D above which tells the operator that each column of A corresponds to a 3x3 image in 2D. Is the 2D nature of these objects 'baked in' the linear algebra in a way that would be very difficult for us to circumvent? My first attempt to set up an Operator on a non 2D grid lead to errors being raised by the linear algebra libraries.

Operators in PyLops do generally do not have concept of grid or dimensions. Your model (and data) are just long vectors, say a flattened image, and you can do whatever you want with them. Some operators (like Convolve2D and FFT2D) do make sense of this vector by first reshaping it into a 2D grid and then operating onto it using scipy's convolve and numpy fft2. So in a way you could say this is baked into them because that is how those operations make sense, but you are free to create any new operator which is less restricted in that sense.

As I say again below, any operator on regular grid followed by Restriction would do but I agree with you, it is likely to be less efficient than implementing something that does directly do the mapping from a regular to an irregular domain. Take the example of the FFF, you could do use the FFT2D and then use a Restriction operator (which is just a bit more efficient way of making a rectangular matrix A with number of rows equal the number of samples to be used and number of columns equal the number of original samples. Say you have a 2d image and want to sample every second sample. What you could do is:

ny, nx = 10, 10
A = np.eye(ny*nx)
A[::2] = A
 Aop = pylops.MatrixMult(A)

or more efficiently:

ny, nx = 10, 10
iva = np.arange(ny*nx)[::2]
 Aop = pylops.Restriction(iava)

which are effectively the same. The second is just much less memory demanding and faster.

But again this would not beat a direct implementation of NUFFT - https://github.com/jyhmiinlin/pynufft as an example.... do you use it already in any way or you have your own implementation of NUFFT?

To elaborate a bit more, when you say using tools that allow us to retain the 1D structure of the masked data in memory, if your tools do effectively implement a NUFFT and/or 2d convolution without having to rely on making a dense/sparse matrix and just applying them via dot product, I do think those tools are your best option. All PyLops need is a piece of code that applies the forward and one that applies the adjoint. Obviously constructing the matrix itself (as i suspect you do from previous comments) has the advantage that the adjoint comes for free, A.T, but the problem that they are either very memory demanding and/or not very efficient. As pynufft has both forward and inverse I think you could easily make a PyLops linear operator out of it (unless you also have your own implementation of NUFFT and could use it instead)

If not, this isn't too problematic, as if things must be 2D I can just set up A to correspond to a 2D grid, but have it such that all rows that correspond to masked image pixels will be entirely 0's (which I guess the sparse algebra library will optimize anyway).

I'm having some issue with understanding the dimensions of the operator. In the code below, I set up an A matrix of shape [9,9] (where the 2D image and source grids are both shape (3,3). I can pass it 9 complex data points, successfully using the FFT2D operator to solve for the solution of the matrix multiplication including the FFT:

y = np.array([1.0 + 1j, 2.0 + 2j, 3.0 + 3j, 4.0 + 4j, 5.0 + 5j, 6.0 + 6j, 7.0 + 7j, 8.0 + 8j, 9.0 + 9j])
A = np.array(
    [
        [1, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 1],
    ]
)

Aop = pylops.MatrixMult(A, dtype="complex64")
FFTop = pylops.signalprocessing.FFT2D(dims=(3, 3))
Op = Aop * FFTop
x = pylops.NormalEquationsInversion(Op=Op, Regs=None, data=y)
print(x)
However, in our use case the dimensions of A do not correspond to the number of complex data-points we want to use it to solve for. This is because the FT changes the dimension of A, for example transforming the columns of A from dimension 9 to dimension 1e10. I've not had any lucky getting my FFT operator, combined with my MatrixMul operator, to work when the input number of visibilities is not 9:
y = np.array([1.0 + 1j, 2.0 + 2j, 3.0 + 3j, 4.0 + 4j, 5.0 + 5j, 6.0 + 6j])

A = np.array(
    [
        [1, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 1],
    ]
)

Aop = pylops.MatrixMult(A, dtype="complex64")
FFTop = pylops.signalprocessing.FFT2D(dims=(3, 3))
Op = Aop * FFTop
x = pylops.NormalEquationsInversion(Op=Op, Regs=None, data=y)
print(x)
This gives a 'dimensions mismatch' error. I have tried to change the dirs and nffts input parameters (which indeed change the dimensions of the operators) but so far have been unable to do so in a way that gives me a successful matrix inversion. Again, I've no doubt I'm just being fairly stupid but any help would be appreciated!

The dimension mistmatch here is because your FFTop and Aop are 9x9 but your y is 6x1. If you do print(Op) your model vector x should always have the second dimension of the operator and your data vector y the first dimension of the operator :)

p.s.

Thanks for the top on using the FFT + Restriction method, when using the matrix formalism we actually have methods that perform the calculation exactly like that and using the NUFFT. In general, we found the NUFFT to be much faster and more accurate, but I suspect we'll end up implementing both in PyLops!

As I said above, I totally see your point. I think the FFT + Restriction is a good trick especially to see if things work, but having a NUFFT is definitely the way to go if you want things to be as optimised as possible. And it would be amazing to have it in PyLops as it could be useful in other use cases!

Jammy2211 commented 4 years ago

First, thank you for the support!

Overview:

Your understanding of the problem sounds correct, however I think it is worth me explaining it clearly (and it makes the discussion more fun and interesting anyway!). I write a software package to model gravitationally lensed galaxies (https://github.com/Jammy2211/PyAutoLens), where the light of a galaxy appears multiple times due to its light being bent by the gravity of a foreground galaxy (space-time curving in general relativity). Here's a simulated image of a strong lens from my software (ignore the black / red lines):

Our goal is the reconstruct the light of the galaxy that was gravitationally lensed (which we don't observe - we only observe it as the multiply imaged 'ring' of light). Assuming a model for how its light is bent, we can create mapping between observed datay and the pixel-grid we'll reconstruct it on 'x', as shown in the figure I showed earlier (hopefully it makes more sense why a 'ring' of red pixels in 'y' map to the same pixel in 'x' - this is how mass curves space time in general relativity!):

As I discussed, we often mask the image data, such that the mappings only occur within a circle or annulus:

As you gathered, our matrix 'A' describes all mapping between our data 'y' and model 'x'. It has shape ['y', 'x'] (where y only includes pixels that are not removed due to masking) and non-zero entries wherever there is a mapping (a colored pixel on the plot). Given we know the values of our data 'y' and the matrix 'A' it is a linear problem to reconstruct the values of 'x' and thus reconstruct the light of our source galaxy:

The way I think about this problem is that every one of our pixels in 'x' corresponds to a sparse image 'y' consisting of 1's and 0's. As you suggest, our actual calculation considers 'x' and 'y' 1D vectors - we don't use their 2D structure at all!

Convolution

So, why would we need to perform a 2D convolution? Basically, when we observe the galaxy using a telescope the image is blurred by the telescope optics. We model this as convolution with a 2D kernel or 'point spread function'. The image of the lens above was not blurred with this kernel, but the observation of it we would make with a telescope are:

We build this into our model 'x' by convolving every 'image' of 'x' (e.g. the 1's and 0's it maps to in the image plane) with the kernel before we solve for 'x'. Thus, our matrix 'A' retains dimensions ['y', 'x'], and assuming the kernel isn't too large (we generally use a 21 x 21 kernel) remains fairly sparse. In the matrix formalism, we solve for our model 'x' using the 'A' matrix which has been convolved with our PSF. I managed to get this calculation working without matrices using the Convolve2D operator (using the [[0.5, 0.5]] kernel, which isn't representative of our problem but was just me messing around with simple code).

I do not think wel need PyLops for this problem. Even for the highest resolution imaging data the matrix 'A''s dimensions will not exceed [1e6, 1e3] and the run times are fine using matrices formalism. Furthermore, as you correctly note, we ultimately need to map the solution 'x' back to an image to compare with 'y', which as with PyLops's we'd only have the original (unconvolved) 'A' available would require us to perform matrix multiplication 'A * x' followed by an additional convolution of this model image. Nevertheless, I intend to write code that performs the calculation in PyLops, for fun :).

NUFFT

The NUFFT comes in when our data is not an image observed with a telescope but instead is observed using an interferometer at radio wavelengths. Here, our data is observed in Fourier space, which astronomers generally call visibilities or the 'uv-plane'. For our problem, all of the steps we performed above for images are identical up to the 2D convolution, giving us an identical 'A' matrix of 1's and 0's with dimensions ~ [1e5, 1e3]. Although our data is in Fourier space, we still create 'A' by assuming an image grid and mask in real space and create mappings to our model 'x' in real space.

However, our data 'y', is now ~1e10 visibilities (complex numbers). The NUFFT maps the real-space model images of 'x' from sparse vectors of 1's and 0's of size ~1e5 to a 100% dense vector of size 1e10 or more! This changes the dimensions of 'A' from [1e5, 1e3] to [1e10, 1e3]. In my problem above, I was unable to get the FFT to perform this dimension change, leading to the dimension mismatch errors.

At this point, it goes without saying that we need PyLops to perform this calculation, as the size of the matrix 'A' completely cripples our memory use! We actually have a super computer with 512GB of RAM on a single CPU, and this is not sufficient for the scale of the calculations we perform :rofl:.

So, to summariize the description, it sounds like you pretty much understood the calculation, but hopefully the details above fill in some of the confusion on the FFT side of things (which has always taken me as a weird calculation given you set up the model in real space where you don't actually observe anything).

PyNUFFT

Yep, we are already using PyNUFFT :). We have a class 'TransformerNUFFT' which inherits from it and it initializes in memory the FFT calculation as much as possible before the calculation. For example, we know the grid we define in real-space and we know the wavelengths of our visibility data which fold into the FFT:

https://github.com/Jammy2211/PyAutoArray/blob/master/autoarray/operators/transformer.py

Indeed, as you say, PyNUFFT has the forward and adjoint operations we need already available. In fact, I already went ahead and adapted the FFT2D operator to use PyNUFFT for our problem (the transformer inherits from the PyNUFFT calculation):

class NUFFT2D(LinearOperator):

    def __init__(
        self,
        transformer,
        dims,
        dirs=(0, 1),
        nffts=(None, None),
        sampling=(1.0, 1.0),
        dtype="complex128",
    ):

        self.transformer = transformer

        # checks
        if len(dims) < 2:
            raise ValueError("provide at least two dimensions")
        if len(dirs) != 2:
            raise ValueError("provide at two directions along which fft is applied")
        if len(nffts) != 2:
            raise ValueError("provide at two nfft dimensions")
        if len(sampling) != 2:
            raise ValueError("provide two sampling steps")

        self.dirs = dirs
        self.nffts = np.array(
            [
                int(nffts[0]) if nffts[0] is not None else dims[self.dirs[0]],
                int(nffts[1]) if nffts[1] is not None else dims[self.dirs[1]],
            ]
        )
        self.f1 = np.fft.fftfreq(self.nffts[0], d=sampling[0])
        self.f2 = np.fft.fftfreq(self.nffts[1], d=sampling[1])

        self.dims = np.array(dims)
        self.dims_fft = self.dims.copy()
        self.dims_fft[self.dirs[0]] = self.nffts[0]
        self.dims_fft[self.dirs[1]] = self.nffts[1]

        self.shape = (int(np.prod(self.dims_fft)), int(np.prod(self.dims)))
        self.rdtype = np.dtype(dtype)
        self.cdtype = (
            np.ones(1, dtype=self.rdtype) + 1j * np.ones(1, dtype=self.rdtype)
        ).dtype
        self.dtype = self.cdtype
        self.explicit = False

    def _matvec(self, x):
        return self.transformer.forward(x)

    def _rmatvec(self, x):
        return self.transformer.adjoint(x)

It actually managed to perform the calculation of my simple toy code problem, albeit I have no idea if it was working correctly or not lol. Assuming I get my head around PyLops enough to be able to do it, I'm happy to implement our NUFFT operator for PyLops via a PR. Don't want to get ahead of myself though!

In Response to your questions

This makes sense with your first definition of the problem. Just to be sure you are using a 2x1 filter that is strided across a 2d image, so effectively you only filter along the first direction. Is that what you want to model? And you have created an identity matrix/operator Aop. Am I correct if I interpret this example as a first step where you assume your data is just a blurred version of your model with 1 to 1 mapping of pixels? But you could have A to have less rows than columns and also not always at the diagonal if you wanted to have a more realistic sampling like you explained for your real scientific problems?

PS: In that case i would suggest to have A as a sparse scipy matrix (MatrixMult allows that) or even explore if you can use the Restriction operator.

Yep, your description of the problem is correct. Again, the 2x1 filter was me messing around with simple code and wasn't reflective of the actual calculation we perform. Nevertheless, it seemed to work! And I agree that setting up A as a sparse scipy matrix is a no-brainer.

Operators in PyLops do generally do not have concept of grid or dimensions. Your model (and data) are just long vectors, say a flattened image, and you can do whatever you want with them. Some operators (like Convolve2D and FFT2D) do make sense of this vector by first reshaping it into a 2D grid and then operating onto it using scipy's convolve and numpy fft2. So in a way you could say this is baked into them because that is how those operations make sense, but you are free to create any new operator which is less restricted in that sense.

That is good to hear! If I really want to get the convolution using PyLops into our code it sounds like I should write a bespoke operator adapted from the Convolve2D operator using our convolution tools. As I said though, we don't really need PyLops for the convolution anyway.

The dimension mistmatch here is because your FFTop and Aop are 9x9 but your y is 6x1. If you do print(Op) your model vector x should always have the second dimension of the operator and your data vector y the first dimension of the operator :)

Okay, it sounds like I just need to get the FFTOp to have the right dimensions - attempting to do this by guessing values that might give the right dimensions didn't get me very far! If, equipped with the description of the problem above you can adapt the code for my problem please do, as it'd give me more confidence in my code to start tackling the regularization side of things. It'd also mean I could begin profiling the calculation time compared to the matrix calculation, which I'm sure will be a lot of fun :).

Just to be clear, the FFT code above I'm trying to write performs our calculation but in a simple low dimensionality case. For our actual use case you can imagine we'll use the Operators and code (once they are set up correctly) but that:

'y' will consist of 1e10 complex values, instead of 6.
'A' will be a sparse matrix of 1's and 0's with dimensions [1e5, 1e3].
The NUFFT will transform 'A' to a matrix with dimensions [1e10, 1e3] (of course with PyLops this matrix never actually 'exists' but I think you get the idea).

Thank you for all the support, I can tell you right now that PyLops is going to make a lot of radio astronomers very happy if we can get this working!

mrava87 commented 4 years ago

Hi, thanks a lot for the details, it makes it much easier to follow and understand what you are trying to achieve :)

Starting from the problem you currently experience with the FFT2D operator. I think I do understand why. As this is a 2D FFT on a regular grid we assume (perhaps this can be loosen a bit) that the samples in the Fourier domain are the same (when setting nffts=(None, None)) or bigger than those in the original domain (when choosing different nfft for each direction. We do not allow the opposite as I dont think this is a common case - but perhaps it is worth allowing that too... Anyways, assuming for now that your y has 12 samples and your x has 9, your example can be modified as follows:

y = np.arange(12) + 1j * np.arange(12)

A = np.array(
    [
        [1, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 1, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 0, 0, 0, 0, 1],
    ]
)

Aop = pylops.MatrixMult(A, dtype="complex64")
FFTop = pylops.signalprocessing.FFT2D(dims=(3, 3), nffts=(4, 3), dtype="complex64")
Op = FFTop * Aop 
x = pylops.NormalEquationsInversion(Op=Op, Regs=None, data=y)
print(x)

You can see that by specifying nifty in FFT2D operator we allow having a non-square operator (print(FFTop) gives <12x9 FFT2D with dtype=complex64>). This way the underlying 2D arrays that are created by reshaping x and y within FFTop will be 3x3 and 4x3 respectively. As I said because of how we handle so far FFT2D you cannot achieve a y of 2x3 from a x of 3x3 but hopefully this helps to clarify why in your case you could not get FFT2D to be other than square, it is the nffts that helps you drive that.

Seeing your NUFFT2D operator that you wrote with the PyNUFFT, I think that the way you wrote should allow for this exact same property, i.e. the operator takes an input with a certain number of samples (1e5) and produces an output with many more (1e10) - so the operator shape should look like (1e10, 1e5). Now if you have this operator and a Aop from a dense/sparse A of shape (1e5, 1e3), it should be a matter of doing Op= FFTop * Aop and as you have understood this will never form a dense matrix but simply apply first Aop to the input vector and then FFTop to the resulting vector... so if your computer can keep in memory a vector of size 1e10 and any auxiliary variables that you make in your init function then neither PyLops nor the solver you use (whether it is from PyLops directly or one from script) would not add anything else and you should be able to solve your problem ;)

I am going to try your NUFFT2D operator (I am very keen about using NUFFT in general and even better if we can getting directly into PyLops ;) ) but please let me know if you think you can now setup the entire problem or there is still anything confusing on the chaining of operators :)

PS: for the convolve problem I agree with your comments, if the convolution is very non-stationary I am not sure that operator can be applied any faster than by making a sparse matrix upfront... what PyLops could still do there is to allow never explicitly multiplying A and the convolution matrices which could make the resulting one less sparse, but as it would then require 2 matrix-vector multiplications for each forward/adoing pass this is a problem specific trade-off that should be tried to see if it pays off anyhow

mrava87 commented 4 years ago

Hi again, I had a look at your NUFFT2D and tried to run it with the 2d image tutorial in PyNUFFT.

class NUFFT2D(LinearOperator):

    def __init__(self, transformer, dims, dims_fft, dtype="complex128"):

        self.transformer = transformer
        self.dims = dims
        self.dims_fft = dims_fft

        self.shape = (int(np.prod(self.dims_fft)), int(np.prod(self.dims)))
        self.dtype = dtype
        self.explicit = False

    def _matvec(self, x):
        return self.transformer.forward(x.reshape(self.dims))
    def _rmatvec(self, x):
        return self.transformer.adjoint(x)

NufftObj = NUFFT_cpu()
Nd = (256, 256)  # image size
Kd = (512, 512)  # k-space size
Jd = (6, 6)  # interpolation size

NufftObj.plan(om, Nd, Kd, Jd)
Fop = NUFFT2D(NufftObj,
              Nd, om.shape[0],
              dtype="complex128")

y = Fop * np.ones(Nd).ravel()

As you can see I slimmed down a bit your class. Assuming you want to pass the NufftObj after plan, the linear operator would pretty much only need to know the size of the image in the regular domain so that it can reshape before applying the forward and pretty much nothing else... in this example Fop has shape 122880x65536, so it can go from a smaller model to a larger data as you want in your problem. Hope this helps and that you can now combine it with the A operator and try run a real example with it :)

Good luck and keep me posted!

Jammy2211 commented 4 years ago

It works! And its amazing!

For ~250000 visibilities, PyLops has reduced a run time of 48 seconds to 1.0 second :O.

For 8 million visibilities, which would crash my laptop before, PyLops is busting out a 4 second inversion, which is incredible given there is some low hanging fruit for the optimization I've not tackled yet.

Thank you for all your help, the light weight NUFFT operator above went a long way to sorting this for us :). These results have inspired me to also implement PyLops for the convolution too, as in the future the quantity of data could easily increase by an order of magnitude of two!

I'll keep this thread open, as I suspect I'll prob need a bit of help with the Regularization operators which have their own peculiarities.

mrava87 commented 4 years ago

Fantastic! This is very exciting, well done :)

And yes feel free to keep this open and use it to discuss the next steps, and good luck!

Jammy2211 commented 4 years ago

I've got things working with the real space grid (Nd=(256,256)) using a mask. I simply had to overwrite the forward() and adjoint() methods in PyNUFFT to map the results from 1d to 2d and visa versa using the mask.

One issue that cropped up is that we normally perform those mappings using numba (http://numba.pydata.org/) for efficiency. This doesn't seem to play nice with the linear algrebra solvers, raising an error. Do you know if there is a simple fix to this?

It doesn't seem to impact the performance a great deal for the PyNUFFT where the mapping is simple. However, our 2d real-space convolution routines rely heavily on numba so this would be a bit of a hurdle. We use these routines because for a masked array you can avoid unnecessarily convolving flux outside the mask, where it is discarded anyway, and avoid numerous 1d/2d mappings. However, swapping our convolution routines for the PyLops's convolution probably wouldn't be too problematic anyway, so its no big deal either way but would be good to know!

mrava87 commented 4 years ago

Heya, nice!

A couple of PyLops operators are written in numba (e.g. https://pylops.readthedocs.io/en/latest/api/generated/pylops.Spread.html#pylops.Spread) so I expect things to work in principle. But I had experienced problems myself before, mostly due to inconsistent types (for example a complex number coming out of a pfft that leaked into the numba routine). Do you mind posting the error you get, it may ring a bell :)

Jammy2211 commented 4 years ago

Ooops I meant to do just that!

Traceback (most recent call last): File "/home/jammy/PycharmProjects/PyAuto/PyAutoLens/test_autolens/profiling/interferometer/inversion_voronoi_magnification_fit_pylops_masked.py", line 125, in x = pylops.NormalEquationsInversion(Op=Op, Regs=None, data=y) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/pylops/optimization/leastsquares.py", line 110, in NormalEquationsInversion xinv, istop = cg(Op_normal, y_normal, *kwargs_cg) File "", line 2, in cg File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/scipy/_lib/_threadsafety.py", line 46, in caller return func(a, **kw) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/scipy/sparse/linalg/isolve/iterative.py", line 281, in cg atol = _get_atol(tol, atol, np.linalg.norm(b), get_residual, 'cg') File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/scipy/sparse/linalg/isolve/iterative.py", line 110, in _get_atol resid = get_residual() File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/scipy/sparse/linalg/isolve/iterative.py", line 280, in get_residual = lambda: np.linalg.norm(matvec(x) - b) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/scipy/sparse/linalg/interface.py", line 222, in matvec y = self._matvec(x) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/pylops/LinearOperator.py", line 48, in _matvec return self.Op._matvec(x) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/scipy/sparse/linalg/interface.py", line 536, in _matvec return self.args[0].matvec(self.args[1].matvec(x)) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/scipy/sparse/linalg/interface.py", line 222, in matvec y = self._matvec(x) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/pylops/LinearOperator.py", line 48, in _matvec return self.Op._matvec(x) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/scipy/sparse/linalg/interface.py", line 536, in _matvec return self.args[0].matvec(self.args[1].matvec(x)) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/scipy/sparse/linalg/interface.py", line 222, in matvec y = self._matvec(x) File "/home/jammy/PycharmProjects/PyAuto/projects/lops/nufft_lop.py", line 35, in _matvec return self.transformer.forward(x) File "/home/jammy/PycharmProjects/PyAuto/PyAutoArray/autoarray/operators/transformer.py", line 340, in forward x2d = array_util.sub_array_2d_from(sub_array_1d=x, sub_size=1, mask=self.real_space_mask) File "/home/jammy/PycharmProjects/PyAuto/PyAutoArray/autoarray/util/array_util.py", line 556, in sub_array_2d_from sub_mask_index_for_sub_mask_1d_index=sub_one_to_two, File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/numba/dispatcher.py", line 401, in _compile_for_args error_rewrite(e, 'typing') File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/numba/dispatcher.py", line 344, in error_rewrite reraise(type(e), e, None) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/numba/six.py", line 668, in reraise raise value.with_traceback(tb) numba.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend) Invalid use of Function() with argument(s) of type(s): (array(float64, 2d, C), UniTuple(int64 x 2), complex128)

parameterized In definition 0: All templates rejected with literals. In definition 1: All templates rejected without literals. In definition 2: All templates rejected with literals. In definition 3: All templates rejected without literals. In definition 4: All templates rejected with literals. In definition 5: All templates rejected without literals. In definition 6: All templates rejected with literals. In definition 7: All templates rejected without literals. In definition 8: All templates rejected with literals. In definition 9: All templates rejected without literals. This error is usually caused by passing an argument of a type that is unsupported by the named function. [1] During: typing of setitem at /home/jammy/PycharmProjects/PyAuto/PyAutoArray/autoarray/util/array_util.py (571)

File "../../../../PyAutoArray/autoarray/util/array_util.py", line 571: def sub_array_2d_via_sub_indexes_from(
        sub_mask_index_for_sub_mask_1d_index[index, 1],
    ] = sub_array_1d[index]
    ^
This is not usually a problem with Numba itself but instead often caused by the use of unsupported features or an issue in resolving types.

To see Python/NumPy features supported by the latest release of Numba visit: http://numba.pydata.org/numba-doc/latest/reference/pysupported.html and http://numba.pydata.org/numba-doc/latest/reference/numpysupported.html

For more information about typing errors and how to debug them visit: http://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile

If you think your code should work with Numba, please report the error message and traceback, along with a minimal reproducer at: https://github.com/numba/numba/issues/new

mrava87 commented 4 years ago

Thanks!

I think I have seen something along these lines before. From what I see the problem may be due to the overall type of your operator (can you please print it) and how the solver initializes its initial guess (using this type).

Are you able to print the type of the vector that is passed to this numba wrapped routine just before it is passed?

Another thing you could try is to run directly the matvec of your overall operator and see if the same problem arises... in this case you can just choose the type of your x vector and see if and where the problem occurs (real, complex, both?). If you see that it happens only for one case (say real) the problem may be with how the solver initializes its starting guess. I would then try to pass it directly by using the x0 in NormalEquationsInversion and choose the correct type. If that works we can then look into how to ensure the operator gets its correct datatype (which is quite tricky when FFT is involved because for real input the output is complex, but we can find a way around as I do for the normal FFT operator)

Hopefully these tests give some more clear hints :)

Jammy2211 commented 4 years ago

Going through the advise above I fixed it :). It was a trivially simple problem, whereby in my numba functions I had to change the following:

array_2d = np.zeros(sub_shape)

to:

array_2d = 0+0j * np.zeros(sub_shape)

I'm now on to Regularization, which I I think will be more straightforward! Here, I wish to regularize the model solution, by penalizing solutions where the values solved for in neighboring pixels on the square pixel grid have large flux difference.

In the matrix formalism, we do this by creating a regularization matrix H which has dimensions [x, x]. For example, if I use a 3x3 grid for the model, and wish to only regularize every pixel with its neighbor to the right, I would first write the following B matrix:

b = np.array(
    [
        [-1,  1,  0,  0,  0,  0,  0,  0,  0],
        [ 0, -1,  1,  0,  0,  0,  0,  0,  0],
        [ 0,  0, -1,  0,  0,  0,  0,  0,  0],
        [ 0,  0,  0, -1,  1,  0,  0,  0,  0],
        [ 0,  0,  0,  0, -1,  1,  0,  0,  0],
        [ 0,  0,  0,  0,  0, -1,  0,  0,  0],
        [ 0,  0,  0,  0,  0,  0, -1,  1,  0],
        [ 0,  0,  0,  0,  0,  0,  0, -1,  1],
        [ 0,  0,  0,  0,  0,  0,  0,  0, -1],
    ]
)

I would then compute the regularization matrix, H, as b.T * b, producing the following H matrix:

[[ 1 -1 0 0 0 0 0 0 0] [-1 2 -1 0 0 0 0 0 0] [ 0 -1 2 0 0 0 0 0 0] [ 0 0 0 1 -1 0 0 0 0] [ 0 0 0 -1 2 -1 0 0 0] [ 0 0 0 0 -1 2 0 0 0] [ 0 0 0 0 0 0 1 -1 0] [ 0 0 0 0 0 0 -1 2 -1] [ 0 0 0 0 0 0 0 -1 2]]

My hope is that I can write a simple regularization operator that, given the matrix, can be passed into the Inversion equations method. Although the examples above used a simple square grid for the model, in reality we use a Voronoi mesh with a variable regularization coefficient, so working with an already computed H and passing this to the operator would be ideal!

Below is my first attempt at a Regularization operator:

class Regularization(LinearOperator):

    def __init__(self, pixels, h, dtype="float64"):

        self.h = h
        self.pixels = pixels
        self.shape = (self.pixels, self.pixels)
        self.dtype = dtype
        self.explicit = False

    def _matvec(self, x):
        return x # matrix multiplication with self.h here?

    def _rmatvec(self, x):
        return x # matrix multiplication with self.h here?

Indeed, the value 'x' that comes is the shape of our model grid, so for the example regularization matrix H above shape (9,). I've yet to wrap my head around how I combine x and self.h to get the forward / adjoint working. I know that _rmatvec has to reshape x from shape (9,)to shape (3,3) and _matvec from (9, ) to (3,3), the matrix multiplication necessary is just not clear! (Its a longgggggggg time since I implemented the core linear algebra in our code so a lot of it has escaped memory).

So, I guess my main question is is this use of a H matrix still valid (or is this another thing my matrix-focused brain is faililng me on) and if so what matrix multiplication goes in _matvec and _rmatvec?

mrava87 commented 4 years ago

Great! I think with numba types are more importart (which makes sense because its complied code), great it was a easy fix :)

For the regularization, let me ask a few questions:

Is your B a simple first derivative in the horizontal direction (assuming x is reshaped as a 3x3 in this case)? and you make H because you want to create the normal equations explicitely ((G^TG+B^TB)^-1 G^T d where G is the modelling operator)? In this case I would have suggested using https://pylops.readthedocs.io/en/latest/api/generated/pylops.FirstDerivative.html#pylops.FirstDerivative but since you then say that in reality you work with Voronoi irregular meshes and (I guess) the concept of right neighbour is a bit more fuzzy, I think your approach is the best way to go about... which means make a dense matrix B and simply wrap it into the MatrixMult operator.
Note that if you want to use https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.NormalEquationsInversion.html#pylops.optimization.leastsquares.NormalEquationsInversion or https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.RegularizedInversion.html#pylops.optimization.leastsquares.RegularizedInversion the best is that your Regularization class implements B (not H) because the H=B^TB is formed directly in the inner of those functions. A similar approach is take for example in sparse solvers and it is always based on the idea that B is generally more sparse than H. Also this way we can use solvers like LSQR which do implicitly solve the normal equations (this is done in https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.RegularizedInversion.html#pylops.optimization.leastsquares.RegularizedInversion) which is known to work better for ill-posed problems because it doesnt square the conditioning number as a direct solution of the normal equations would do.
If instead you only are able to create H (although it does not seem to be the case from what you write...), we could probably re-write a modification of NormalEquationsInversion where what we call Regs will become the already formed H=B^T B (instead of B)
Indeed, the value 'x' that comes is the shape of our model grid, so for the example regularization matrix H above shape (9,). I've yet to wrap my head around how I combine x and self.h to get the forward / adjoint working. I know that _rmatvec has to reshape x from shape (9,)to shape (3,3) and _matvec from (9, ) to (3,3), the matrix multiplication necessary is just not clear! (Its a longgggggggg time since I implemented the core linear algebra in our code so a lot of it has escaped memory). I am not sure I follow this: I would think that x is a vector of size N (that represents an image - a matrix of size n1xn2 with n1n2=N if in a regular grid - or just N scattered points in your Voronoi grid). B is a matrix of size NxN which applies something to this vector. In your first example this is a simple derivative in x direction. As I mentioned, if you write this already as a matrix (your B), then you don't even need to create a new class and can just use MatrixMult passing your matrix B, that would do the trick :) if instead you can/want to apply this regularization without explicitly making a matrix (like we do for a first derivative on a regular 1,2, or N-dimensional grid with FirstDerivative), then I need more details to understand how to implement the adjoint as it really depends on how the forward looks like :)

Jammy2211 commented 4 years ago

Is your B a simple first derivative in the horizontal direction (assuming x is reshaped as a 3x3 in this case)? and you make H because you want to create the normal equations explicitely ((G^TG+B^TB)^-1 G^T d where G is the modelling operator)? In this case I would have suggested using https://pylops.readthedocs.io/en/latest/api/generated/pylops.FirstDerivative.html#pylops.FirstDerivative but since you then say that in reality you work with Voronoi irregular meshes and (I guess) the concept of right neighbour is a bit more fuzzy, I think your approach is the best way to go about... which means make a dense matrix B and simply wrap it into the MatrixMult operator.

This is 100% correct. It blows my mind how from my fairly rubbish descriptions you are able to fully dissect the problem and describe it so much more succinctly than myself!

As you suggested, my starting point was the FirstDerivative, which I got working for some simple test cases but knew I'd need something more involved for the Voronoi mesh.

Note that if you want to use https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.NormalEquationsInversion.html#pylops.optimization.leastsquares.NormalEquationsInversion or https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.RegularizedInversion.html#pylops.optimization.leastsquares.RegularizedInversion the best is that your Regularization class implements B (not H) because the H=B^TB is formed directly in the inner of those functions. A similar approach is take for example in sparse solvers and it is always based on the idea that B is generally more sparse than H. Also this way we can use solvers like LSQR which do implicitly solve the normal equations (this is done in https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.RegularizedInversion.html#pylops.optimization.leastsquares.RegularizedInversion) which is known to work better for ill-posed problems because it doesnt square the conditioning number as a direct solution of the normal equations would do. If instead you only are able to create H (although it does not seem to be the case from what you write...), we could probably re-write a modification of NormalEquationsInversion where what we call Regs will become the already formed H=B^T B (instead of B)

We have both B and H available to use. The reason we have worked with H in the past is because we compute B and H directly from a 'pixel_neighbors' array describing the neighbor scheme of our Voronoi mesh. So, we can cut-out the computation of B altogether and go straight to H, saving on run-time.

For the Voronoi mesh, we regularization every pixel with every pixel it shares a Voronoi vertex. In terms of B matrices, this means we create a B_1 matrix describing all neighboring pixels we designate as neighbors of index 1. We then create a B_2 for neighbos designated index 2, and so. This will typically create the ~10 B matrices, from which we would compute H as follows (baring in mind in reality we skip this calculation using the pixels_neighbor array):

H_1 = B_1^TB_1 H_2 = B_2^TB_2

.... for all B's up to ~10

Finally: H = H_1 + H_2 + H_n

I am not sure if we can compute a single B matrix that corresponds to the correct H, or if we by have to create ~10 B matrices. Given we never really used B for our calculations this isn't something I'd have thought about too much in the past.

It sounds like we have two options:

Work only with B matrices, wrapping them into the MatrixMul operator. This has benefits you described above with it being the inner of H, but if we are working with ~10 B matrices we may lose out in efficiency / simplicity compared to H (although there may be a straight forward way to express all the B's in one matrix).
Work directly with H via a modified NormalEquationsInversion function.

Both seem straight forward enough, and I'm happy to take your advise on whatever you think sounds best. If it if necessary to carry around many B matrices I'd be inclined to go for option 2, but modifying the NormalEquationsInversion function would have some downsides.

Indeed, the value 'x' that comes is the shape of our model grid, so for the example regularization matrix H above shape (9,). I've yet to wrap my head around how I combine x and self.h to get the forward / adjoint working. I know that _rmatvec has to reshape x from shape (9,)to shape (3,3) and _matvec from (9, ) to (3,3), the matrix multiplication necessary is just not clear! (Its a longgggggggg time since I implemented the core linear algebra in our code so a lot of it has escaped memory). I am not sure I follow this: I would think that x is a vector of size N (that represents an image - a matrix of size n1xn2 with n1n2=N if in a regular grid - or just N scattered points in your Voronoi grid). B is a matrix of size NxN which applies something to this vector. In your first example this is a simple derivative in x direction. As I mentioned, if you write this already as a matrix (your B), then you don't even need to create a new class and can just use MatrixMult passing your matrix B, that would do the trick :) if instead you can/want to apply this regularization without explicitly making a matrix (like we do for a first derivative on a regular 1,2, or N-dimensional grid with FirstDerivative), then I need more details to understand how to implement the adjoint as it really depends on how the forward looks like :)

Your description is correct - I think I muddled some of my words when trying to describe the inputs and outputs. I have no issues with us having to make a matrix for the regularization - its sparse and low memory so seems like the simplest thing for us to do without any loss to performance. It seems like the big question is whether we work with B or H!

mrava87 commented 4 years ago

Ahaha, thanks! And you definitely explain things very clearly, that helps :)

So, I am not entirely sure I can say definitely go for option A or B.

I would say, the second one seems less invasive as it would keep the overall problem setup as you had it before. If you give me some days I can try to help you recast the NormalEquationsInversion function so that Reg is already the Reg^T Reg (instead of forming it inside)... but this made me think that we can only do this for special cases where the regularisation equations are of the form B * x = 0. If the RHS were to be different from 0 then our normal equations will also need direct access to B (not just H=B^T B) in the data term - see the end of the equation in Notes here https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.NormalEquationsInversion.html#pylops.optimization.leastsquares.NormalEquationsInversion. I think that is not not the case for you or any time we want to minimise a derivative of some sort (or any other thing like the norm in a different domain) but there are other general cases where you would like the RHS to be non-zero.

I would also suggest trying to make N independent B matrices, wrap each of them in MatrixMult and further all of them into a VStack operator, so you have a stack of matrices. Then you can compare the timings of applying the forward+adjoint of this operator vs the forward of the H operator made by MatrixMult directly on H matrix. The reason I ask this is because if Bs are very very sparse but the overall H is not, it may be cheaper to apply N mat-veg multiplications with very sparse matrices than 1 with a more dense one... but this is definitely something that we can't tell unless trying it on a specific case, as it doest generalize :) if you see that this is definitely a win in terms of both time and memory for H then we can go for what I suggested, otherwise its up to you to choose whether you are happy to slightly change how things are done or not... from PyLops point of view both option are totally doable ;)

Jammy2211 commented 4 years ago

I'm going to have a go at making N independent B matrices, ultimately aiming to use the VStack operator. As you say, there is no way to be sure if approaching this via the B matrices of H is better, and it could well depends on the specifics of the calculation.

If you have time to set up a NormalEquationsInversion that uses H please do! But I should be able to get by going via B if not :), which as I said I'm keen to do for the sake of profiling anyway. I'll let you know how I get on!

mrava87 commented 4 years ago

Sure, will do that :)

mrava87 commented 4 years ago

Done :) I added a new input parameter NRegs to NormalEquationsInversion (https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.NormalEquationsInversion.html#pylops.optimization.leastsquares.NormalEquationsInversion) to be used when you have already a pre-formed B^TB as in your case - see the example here https://pylops.readthedocs.io/en/latest/tutorials/solvers.html#sphx-glr-tutorials-solvers-py

Give it a go and let me know what you think :)

Jammy2211 commented 4 years ago

We have our first PyLops source reconstruction :).

It looks a bit ropey, but there are still some teething issues I need to address, most notably adding a Weight operator to fit the data based on its signal to noise.

The Regularization operator seems to be working pretty well, implemented as follows:

class RegularizationLop(pylops.LinearOperator):
    def __init__(self, regularization_matrix, dtype="float64"):

        self.regularization_matrix = regularization_matrix
        self.pixels = regularization_matrix.shape[0]
        self.dims = self.pixels
        self.shape = (self.pixels, self.pixels)
        self.dtype = dtype
        self.explicit = False

    def _matvec(self, x):
        return x

    def _rmatvec(self, x):
        return self.regularization_matrix

My pylops call now looks like this:

Aop = pylops.MatrixMult(sparse.bsr_matrix(mapper.mapping_matrix), dtype="complex64")
Fop = transformer

Op = Fop * Aop

Rop = reg.RegularizationLop(regularization_matrix=regularization_matrix)

reconstruction = pylops.NormalEquationsInversion(Op=Op, Regs=None, epsNRs=[1.0], NRegs=[Rop], data=y)

I'll first double check - are the _matvec and _rmmatvec methods okay? It seems like messing around with them doesn't change the solution!

The second thing I'll ask is regarding the line:

epsNRs=[1.0], NRegs=[Rop]

From what I can tell, changing the regularization coefficient that I use to make the regularization matrix and therefore Rop has no impact on your source reconstruction. For example, if I double the regularization coefficient (making all entries in H double) the solution does not change. In order to change the level of regularization, I have to change the value of epsNRs.

Is the code renormalizing the H matrix, or applying some sort of rescaling on it, or does it simply ignore the overall magnitude of the matrix? The reason that I ask is that we often apply a luminosity weighted H matrix, where the regularization coefficient of every pixel in our model is set adaptively via some clever calculations. Thus, there is no single value of epsNRs that we can input if the 'scale' of the H matrix is being omitted.

I'm still working on using the B matrices and will let you know how my profiling goes once I get there :).

mrava87 commented 4 years ago

Hey, nice :)

I am not entirely sure about your RegularizationLop: if I remember correctly you wanted to apply some sort of difference with neighbouring pixels that you encoded in a sparse matrix upfront, which I assume is what you call regularization_matrix? If that is the case I would expect the _matvec to be something along the lines of:

def _matvec(self, x):
        return np.dot(regularization_matrix, x)

(or the scipy.sparse equivalent of np.dot. As you write it, the forward acts basically like and identity operator, as given x it returns x, so the action of Rop is like y=I*x where I is an identity.

In this case the _rmatvec would be something like:

def _matvec(self, x):
        return np.dot(regularization_matrix.T, x)

but this is actually never used in the special case of pass Rop to NormalEquationsInversion as one of NRegs because you basically tell the solver that you have already pre-made the B^TB action so it will only apply its forward - inside this function the scripy cg solver is effectively used (https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.cg.html)

Regarding the scaling, it should not behave as you say. There is definitely no rescaling of any sort inside the solver so if you have a regularisation term Rop and and epsNRs=[10.0] or 10*Rop and and epsNRs=[1.0] you should get the same result. But my suspicion is that no matter what your H/regularization_matrix looks like, this is never used so I can see why changing it doesn't change the solution. Instead epsNRs is applied to the output of the regularisation so it makes sense the solution changes no matter what you write in your _matvec.

Anyways to double check nothing is behaving wrongly inside the solver I made this simple example:

nx = 10
x = np.arange(nx)

A = np.random.normal(0, 1, (nx, nx))
Aop = MatrixMult(A)

y = Aop * x

x_cg = NormalEquationsInversion(Aop, [], y, epsI=epsI, returninfo=False, **dict(maxiter=2000))

plt.figure()
plt.plot(x)
plt.plot(x_cg)

epsNR=np.sqrt(0.1)
epsI=np.sqrt(1e-4)

D2op  = SecondDerivative(nx, dims=None, dtype='float64')
ND2op = MatrixMult((D2op.H * D2op).todense())

niter = 4
x_cg1 = NormalEquationsInversion(Aop, [], y, NRegs=[ND2op], epsI=epsI, epsNRs=[epsNR], 
                                 returninfo=False, **dict(maxiter=niter))

ND21op = epsNR**2 * ND2op
x_cg2 = NormalEquationsInversion(Aop, [], y, NRegs=[ND21op], epsI=epsI, epsNRs=[1.], 
                                 returninfo=False, **dict(maxiter=niter))

print(x_cg1-x_cg2)

n = np.random.normal(0, 1, nx)
print(epsNR**2 * (ND2op * n) - ND21op * n)

the two prints should both give you vectors full of zeros. Do you mind checking if its the case for you?

Also if you replace D2op = Identity(nx, dtype='float64') which is going to basically behave as your operator so far (as I explained above) things should be again the same.

Let me know how it goes :)

Jammy2211 commented 4 years ago

Got distracted with some grant proposals, finally in a position to finish up this :).

I checked the code you pasted, and it works as expected. Furthermore, changing the coefficient of the regularization matrices is now doing what I'd expected - still not 100% everything is implemented correctly but think I'm in a position to sort the high level implementation and go back to the details in a bit.

First issue - I think the version of PyLops on PyPI has a problem. When I got clone the master branch, everything works fine, however if I use 1.9.1 on pypi I get this issue:

Traceback (most recent call last):
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoLens/test_autolens/profiling/interferometer/inversion_voronoi_magnification_fit_pylops_masked_reg_new.py", line 102, in <module>
    visibilities_complex=masked_interferometer.visibilities_complex
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoLens/autolens/lens/ray_tracing.py", line 785, in inversion_interferometer_from_grid_and_data
    visibilities_complex=visibilities_complex,
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoArray/autoarray/operators/inversion/inversions.py", line 444, in from_data_mapper_and_regularization
    check_solution=check_solution,
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoArray/autoarray/operators/inversion/inversions.py", line 550, in from_data_mapper_and_regularization_lops
    Op=Op, Regs=None, epsNRs=[1.0], NRegs=[Rop], data=visibilities, Weight=Wop
  File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/pylops/optimization/leastsquares.py", line 110, in NormalEquationsInversion
    xinv, istop = cg(Op_normal, y_normal, **kwargs_cg)
TypeError: cg() got an unexpected keyword argument 'epsNRs'

The second (and hopefully final!) thing is the Weight operator. Assocoiated with our visibilities is a noise-map, with gives the real and complex noise of every visibility in our dataset.

Lets consider the case of CCD Imaging data, which doesns't has complex entries. Here, we basically have a data vector d of values and noise map sigma of noise map values. In the matrix formalism, we would create the following two vector D and matrix F from d and our model f (we've been using x above, but the equations I have access to below use f) which maps image-pixels to source pixels:

This basically weights the data by the noise-map so as to find the solution that minimizes chi-squared. Our result y is called S and is as follows:

For the interferometer case, we essentially just create D and F for separate for the real and imaginary values, and then added them together (D = D_real + D_imag), (F = F_real + F_imag), solving for S using these D and F.

From what I can tell, I just need to make a WeightOperator which has the noise-map as input and apply the correct operations on x:

class WeightOperator(pylops.LinearOperator):

    def __init__(self, noise_map, source_pixels):

        self.noise_map = noise_map
        self.dims = noise_map.shape[0]
        self.source_pixels = source_pixels
        self.shape = (self.dims, self.dims)
        self.dtype = "complex128"

        self.explicit = False

    def _matvec(self, x):
        return x

    def _rmatvec(self, x):
        return x

I suspect I need some form of * or divide by self.noise_map or self.noise_map**2. I should be able to figure it out once the rest of the code is up and running but figured I'd post it here first incase the answer is obvious to you :P.

mrava87 commented 4 years ago

Heya, great to hear you are back on the code :)

Regarding PyPi, I have not yet made a release with the changes we made to NormalEquationsInversion so what you see is expected. I am currently on holiday but once I am back next week I can make a release so that you can use the official PyPi latest version instead of the master. And I will reply to the second part of your comment :)

Jammy2211 commented 4 years ago

Great thanks, enjoy your holiday!

mrava87 commented 4 years ago

Hi again, if you simply want to weight your data by an element-wise term 1/sigma^2 you can use the Weight input in pylops.optimization.leastsquares.NormalEquationsInversion and simply pass a Diagonal operator that you create using the vector of your scalings (which I guess in your case will be w=1./noise_map.ravel() assuming this has the same size of your data vector).

The code will be something along these lines:

w=1./noise_map.ravel()
W = pylops.Diagonal(w)
xinv = pylops.optimization.leastsquares.NormalEquationsInversion(..., Weight=W)

By doing so, what is internally formed and solved is the equation in the Notes of https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.NormalEquationsInversion.html#pylops.optimization.leastsquares.NormalEquationsInversion which you can see has W both on the data term as well as in between Op^T and Op that is going to be the same as in the equations you write :)

mrava87 commented 4 years ago

Hi @Jammy2211, Pylops 1.10.0 is now on PyPI with the new NormalEquationsInversion. You can use it instead of the master if you prefer :)

Did you manage to progress further with the weighting?

Jammy2211 commented 4 years ago

Brilliant, thank you - this means I can sort a release of PyAutoLens with full PyLops support :). We're submitting to the Journal of Open Source Software soon and I'll make sure PyLops is clearly visible in the submission!

The weighting seemed to work great (and was super easy to implement with your help), so everything seems to be functioning perfectly for the interferometer datasets, which was the fully dense use case that required PyLops the most :). I should be good to get things working for the imaging datasets from here, but given they're not such a bottleneck this isn't as much a priority right now.

The run times and memory use are a huge improvement on the old approach (x10+), but they're still prohibitive for the largest datasets we're targeting. My profiling code was fitting fairly crude models, and it appears their simplicity meant the least squares solver found a solution much faster than a more realistic model-fit, which is frustrating. We can get good run-times reducing the tolerance to 1e-1, but that probably isn't a great idea. Its feasible for us to have a pretty decent 'guess' of the true solution beforehand though, so I'm going to explore if that can offer a big speed up. Finally, I might dip my toe into the world of preconditioning if all else fails.

Let us know if you have any suggestions!

mrava87 commented 4 years ago

Fantastic! It would be amazing if you could cite our reference paper, thanks a lot :)

I think starting with a guess is something that should always be done when possible. What do you mean least squares solver found a solution much faster than a more realistic model-fit, what type of solver is the other one?

Regarding preconditioning, it is quite funny but people generally talk about it in different ways. Do you have a scaling problem (some data points / model parameters that are much bigger than others) or you have an idea of a transformed domain where your solution should show some special behaviour?

In both cases you can easily add preconditions in pylops but the way is done would be slightly different:

unbalanced data (one or more rows bigger than others): you could just have a diagonal operator Sc that rescales your data points to the same range and then do d1= Sc * d, G1=Sc*G and solve the d1=G1*m problem the same way you do now
unbalanced model (one or more columns bigger than others): you could just have a diagonal operator Sc that rescales your model points to the same range and then do G1=G*Sc, solve the d=G1*m1 problem the same way you do now and then find the model in the original domain from the scaled one m=Sc1*m1 where Sc1 is the inverse of Sc
transformed domain: this is quite similar to the previous case, you would solve for d=G1*m1 with G1=G*Pand then recover m=P*m1. This is done in https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.PreconditionedInversion.html#pylops.optimization.leastsquares.PreconditionedInversion but you can also easily add it yourself to the operator when using https://pylops.readthedocs.io/en/latest/api/generated/pylops.optimization.leastsquares.NormalEquationsInversion.html#pylops.optimization.leastsquares.NormalEquationsInversion . We use this a lot in geophysics with sparse solvers as we know our models have sparse representations in some domains (e.g. wavelet) or also in cases we know they are smooth (instead of using a roughening regularisation, we precondition with a Smoothing operator)... but this is very model specific and i remember your model is a number of scatter points on a voronoi grid, so i would not know if there is any behaviour you can try to impose - you probably do! ;)

Good luck with your submission!

Jammy2211 commented 4 years ago

The PyLops paper is already cited :).

I think starting with a guess is something that should always be done when possible. What do you mean least squares solver found a solution much faster than a more realistic model-fit, what type of solver is the other one?

Sorry, poor phrasing on my part. Both solutions are using the same solver, the difference in run time comes down to the part beforehand where we set up the grid on which we reconstruct the source using a model for the galaxy's mass. If the mass model is inaccurate the source reconstruction essentially reconstructs noise, which gives much faster run times than when its a more physical model. So, when I switched the mass model to something 'correct' I took a hit on run times, which implies the solver has a much harder time finding the solution.

There is a paper we are aiming to use the same approach to preconditioning as (https://arxiv.org/pdf/2005.03609.pdf), which describes preconditioning as follows:

Making the substitutions D→ ̃DandC−1x→ ̃C−1xgives theNUFFT-based version of equation (7),

which we will refer to later in this section. The consequence of introducing the operators ̃Dand ̃C−1xis that we no longer have an explicit matrix representation for this equation. Rather, the left-hand side exists only as a function that emulates matrix-vector multiplications. For the rest of this section, we describe in more detail our handling of the linear solver, and the computation of the log-determinant (see Section 3.4) under this restriction.

The only way of obtaining a solution is to use an iterative linear solver. In general terms, such solvers apply a linear operator repeatedly, subtracting residuals from a trial solution until a desired tolerance is reached. We have adopted a ubiquitous choice, the preconditioned conjugate gradient solver (PCCG), with a convergence tolerance of 10−12. The conjugate gradient method is derived in such a way that the largest residual components are subtracted first, giving fast convergence. The use of a preconditioner further accelerates convergence. We use the PCCG solver implementation provided by the PETSc framework (Balayet al. 1997, 2018), which is an MPI-parallel library designed for solving large linear systems. For a general linear system As=b, the preconditioner is a matrix approximating A so that we can instead solve

If P−1A≈I, then a solution can be achieved in far fewer iterations. Our source inversion absolutely depends on finding a good preconditioner matrix, as the original system,given in equation (7), can have condition numbers of higher than 10^10 depending on the uvcoverage and regularization strength (condition number is a measure of how singular a matrix is; the identity matrix I has a condition number of 1).

Finding a suitable preconditioner matrix P is highly dependent on the features of the particular problem under consideration. Ours is based on the image-plane noise covariance C−1x=DTC−1D. As discussed in Section 3.2, each row of this matrix simply contains the naturally-weighted dirty beam;that is, it is dominated by its diagonal when the uvcoverage is good.

We take advantage of this property by approximating this matrix with its diagonal, such that each entry is simply the brightest pixel in the dirty beam. Using only the diagonal of C−1x(rather than e.g. three elements per row) guarantees positive-definiteness of P, which is a requirement for CGsolvers. Our preconditioner is then

which is a sparse matrix of dimension Nsrc×Nsrc that can be computed explicitly. We apply a Cholesky decomposition to P, which we compute once for each source inversion. We use the MUMPS direct solver (Amestoy et al. 2001, 2006) for this decomposition, which is conveniently provided within PETSc. This decomposition is equivalent to P−1 and can be quickly applied at each CG iteration. We find that this preconditioner works extremely well, reducing the condition number by a factor of∼106 or more and requiring∼100×fewer iterations to achieve convergence for the problems presented in this paper. We note that the choice of precondioner mainly determines the speed with which convergence is reached and not the precision of the solution.

Precondioning is a new concept to me (I literally read about it for the first time yesterday) so I've got a lot to learn, but if the text above sounds like it isn't too tricky that'd be amazing as it seems like this is the final piece of the puzzle!

mrava87 commented 4 years ago

Yes, this is what i referred to unbalanced data in its more general form. Basically find a matrix M that can reduce the conditioning of your problem, and for your case it seems like these people found a fairly good one... I guess if NUFFT goes out of the picture (Cx is approximated by identity) then the rest is cheap to evaluate and small enough to store (as you data is huge but model quite small, right?)

I think provided you can make P and do a Cholesky on it (scipy has all you need there), then you can apply once to your data and at every iteration like they do to your operator like they describe.

I may miss something but in principle if you avoided creating the normal equations explicitly you could use lsqr instead of cg and this is well-known to be a better approach for ill-posed (slow to converge) problems as CG on the normal equations effectively squares the conditioning number (cond(A^TA) = cond(A)^2)). Maybe that is another directly worth thinking about - a preconditioner there may help too but you would start with a less ill-posed problem already :)

Jammy2211 commented 4 years ago

Yes, this is what i referred to unbalanced data in its more general form. Basically find a matrix M that can reduce the conditioning of your problem, and for your case it seems like these people found a fairly good one... I guess if NUFFT goes out of the picture (Cx is approximated by identity) then the rest is cheap to evaluate and small enough to store (as you data is huge but model quite small, right?)

Yep, exactly that.

I think provided you can make P and do a Cholesky on it (scipy has all you need there), then you can apply once to your data and at every iteration like they do to your operator like they describe.

It looks like computing P will be straight forward enough. Once I've got my code computing P the only difficult part will be figuring out how to fold it into the PyLops inversion as a LinearOperator. It looks like the tutorials you linked to can help me with that.

I may miss something but in principle if you avoided creating the normal equations explicitly you could use lsqr instead of cg and this is well-known to be a better approach for ill-posed (slow to converge) problems as CG on the normal equations effectively squares the conditioning number (cond(A^TA) = cond(A)^2)). Maybe that is another directly worth thinking about - a preconditioner there may help too but you would start with a less ill-posed problem already :)

I'll look into this too, again I think the reason we create the normal equations is cause this is how we've always done it in the past lol. Would the correct way to approach this be for us to change our regularization implementation to not use H explicitly but instead use the B matrices on the matrix x (as we discussed above), and instead use the RegularizedInversion class in PyLops with essentially the same operator that I'm using currently?

In terms of preconditioning I guess both these approaches use the same preconditioner.

mrava87 commented 4 years ago

Correct, you would need B. And the best preconditioner may be a different one because you would want to apply it directly to the modelling operator rather than its normal equation... but I am not sure myself, I would need to think about it looking at the equations

Regarding adding the preconditioner to RegularizedInversion, it may require a small change in the inner of the solver as Pinv will need to be left-multiplied to the rest of the normal equation modelling operator (and to the data) but that should be very easy to do... you could try and if that works fine we can easily add it to the PyLops solver 🥇

mrava87 commented 4 years ago

Hi @Jammy2211, we are going to give a tutorial at PyData global and I wanted to mention different projects that use PyLops. Do you have a link to your Journal of Open Source Software we can refer to :)

Jammy2211 commented 4 years ago

We haven't submitted to JOSS yet (few more features to go), but you can link to the GitHub page and the MNRAS paper below which cover the method in its non-Python form :)

https://github.com/Jammy2211/PyAutoLens https://arxiv.org/abs/1708.07377

Took a break from PyLops to work on some other features but should be back soon to work on the preconditioning and hopefully be able to close this issue once and for all!

mrava87 commented 4 years ago

Thanks, will do that!

And looking forward to hearing when you get back to the preconditioner :)

Jammy2211 commented 4 years ago

Finally back to (hopefully) finish this off :).

I have a function which now computes a preconditioner matrix P, which is fast to compute, uses minimal memory and I can apply it to the solution to gain a (significant) speed up on the calculation.

I haven't look too into the details of implementing this in PyLops yet, but my first quick question is whether we can use this P matrix in the PyLops NormalEquationsInversion equation in an analogous fashion to the how we used the H matrix for my specific problem. I recall that we adjusted the inputs to take the regularization matrix H as an input (as opposed to passing in operators for each individual B matrix), where you added the input parameters NRegs which is used as follows:

        Rop = reg.RegularizationLop(regularization_matrix=regularization_matrix)

        reconstruction = pylops.NormalEquationsInversion(
            Op=Op,
            Regs=None,
            epsNRs=[1.0],
            NRegs=[Rop],
            data=visibilities.as_complex,
            Weight=noise_map.Wop,
            tol=settings.tolerance,
        )

Is it feasible to add an analogous input parameter to handle the pre-conditioner, given P is cheap for me to compute? I'll look into how I would do this using the 'normal' PyLops API tomorrow, just a thought whilst I wrap my head around the problem :).

p.s. I am still getting round to setting the problem up to use the regularization matrices for B's. I think that for a different use-case this will be a much better approach than the use of NRegs above, its just finding the time to test and implemented the new code :(.

mrava87 commented 4 years ago

Hi James, great to hear that you are back on this and have made a fast preconditioner.

Just to be sure I did not completely forget the problem, this is the equation you aim to solve right: 90334078-e7925d80-dfc2-11ea-8588-6fe9a5b51747

where As=b is what are able to solve now with pylops.NormalEquationsInversion?

If that is the case I think you can easily do so by passing your preconditioner in the **kwargs_cg dictionary. Following the doc of the conjugate gradient solver from scipy that we use internally (https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.linalg.cg.html - see the M input parameter), you can modify your call this way:

Rop = reg.RegularizationLop(regularization_matrix=regularization_matrix)
Mop = ...
reconstruction = pylops.NormalEquationsInversion(
            Op=Op,
            Regs=None,
            epsNRs=[1.0],
            NRegs=[Rop],
            data=visibilities.as_complex,
            Weight=noise_map.Wop,
            tol=settings.tolerance,
            **dict(M=Mop))
        )

where Mop needs to be pylops linear operator that wraps your function which applies the inverse of the preconditioner to a vector as you have done before.

Makes sense?

Jammy2211 commented 4 years ago

Your description of the problem above is correct :).

I've implemented the preconditioner as follows:


# Calculate P

preconditioner_matrix = inversion_util.preconditioner_matrix_via_mapping_matrix_from(
                mapping_matrix=mapper.mapping_matrix,
                regularization_matrix=regularization_matrix,
                preconditioner_noise_normalization=np.sum(1.0 / noise_map ** 2),
            )

# Calculate P^-1, using the cholesky decompostion as opposed to the inverse for speed.

preconditioner_cholesky = np.linalg.cholesky(preconditioner_matrix)

Mop = pylops.MatrixMult(preconditioner_cholesky)

reconstruction = pylops.NormalEquationsInversion(
                Op=Op,
                Regs=None,
                epsNRs=[1.0],
                NRegs=[Rop],
                data=visibilities.as_complex,
                Weight=noise_map.Wop,
                tol=10**-8,
                **dict(M=Mop),
            )

It seems to be doing something (e.g. if I make the preconditioner all zeros, the code doesnt find a solution). However, it doesn't seem to be speeding up the calculation much (4.5 seconds -> 3.5 seconds, and speed up there of) depending on how I change the precondition matrix.

If the implementation above looks good to you (I hope I'm right using pylops.MatrixMult) then I would assume I've not quite set up the precondtioner matrix itself correctly, which is a non-PyLops specific problem I can figure out myself. However, a quick confirmation that the code above looks OK would be great before I go down that rabbit hole lol.

p.s. we submitted to JOSS yesterday, I'll keep you in the loop when the paper is out!

mrava87 commented 4 years ago

Heya, nice to hear that your paper is submitted :)

I think the code looks correct, if you were to use the inverse of P I would expect it to do what the equation says. The only thing that puzzles me a bit is the fact you apply a cholesky decomposition to your matrix but then only use the lower triangular part of it. I had to google it myself to get some inspiration but looking at this https://www.mathworks.com/matlabcentral/answers/303431-matrix-inverse-using-cholesky-decomposition I think I have a suggestion:

A = np.random.normal(0, 1, (100, 100))
A = A.T @ A # matrix needs to be symmetric definite pos for cholesky
L = np.linalg.cholesky(A)

# benchmark
Ainv = np.linalg.inv(A)
x = np.ones(100)
y = Ainv @ x

# use of choleskly with explicit inverse
U = solve_triangular(L, np.eye(100), lower=True)
Ainv1 = solve_triangular(L.T, U, lower=False)
y1 = Ainv @ x

#  use of cholesky with implicit inverse
u = solve_triangular(L, x, lower=True)
y2 = solve_triangular(L.T, u, lower=False)

plt.figure()
plt.imshow(Ainv, vmin=Ainv.min(), vmax=Ainv.max())

plt.figure()
plt.imshow(Ainv1, vmin=Ainv.min(), vmax=Ainv.max())

plt.figure()
plt.imshow(Ainv-Ainv1, vmin=Ainv.min(), vmax=Ainv.max())

plt.figure()
plt.plot(y, 'g')
plt.plot(y1, '--b')
plt.plot(y2, '--r')

These 3 approaches should all return the same result, but the last one does never create an matrix explicitly as it basically acts as a linear operator that applies P^-1x = y by solving the following inverse problem x = Py -> x = L L^T y, so first solve Lu=x and then solve L^T y = u and get y = P^-1x.

You should be able to wrap this easily into a linear operator, something like this:

class PrecInv_Cholesky(LinearOperator):
    def __init__(self, A):
        self.L = np.linalg.cholesky(A)
        self.shape = (A.shape[0], A.shape[0])
        self.dtype = A.dtype

    def _matvec(self, x):
        u = solve_triangular(self.L, x, lower=True)
        y = solve_triangular(self.L.T, u, lower=False)
        return y

    def _rmatvec(self, x):
        return self._matvec(x)

Pop = PrecInv_Cholesky(A)
dottest(Pop, 100, 100, verb=True)
y3 = Pop * x

plt.figure()
plt.plot(y, 'g')
plt.plot(y3, '--m')

It would be good if you can stress test this a bit (especially if your P has complex numbers things may need some changes) but hopefully this gives you something to try out and see if it really speeds things up.

Jammy2211 commented 4 years ago

Thanks for the confirmation, and the tips :). I had been using the inverse, but the paper recommended that we can use the cholesky decomposition for speed and because we need to compute log det A. I'll experiment with all these different options as I test the code.

Now I have confidence the preconditioner is functional, I believe the problem is that my precondtioner matrix is not doing a great job at approximating A^-1 and this appears to a normalization issue. The magnitude of A depends on the NUFFT / units visibilities are defined in, in a way that is not currently fed through to the precondtioner. At this point its no longer a PyLops issue but my issue either way!

I'll keep this issue open for a little longer in case anything crops up, but once we get that sweet glorious precondtioning speed up I'll drop by to close it, as at that point it'll be time to do science :). Thank you so much for all your help, I'll make sure you let you know when the first science papers hit and when the JOSS review is complete!

Jammy2211 commented 4 years ago

Another question, hopefully one of the last ones :P.

When we use the matrix formalism to perform the inversion, we do the following. First, we create the matrix y that maps source pixels to image pixels, which is transformed via the NUFFT, such that y_ft consists of real and complex entries.

Next, we use y_ft to create the matrices D and F (F is what we typically refer to above as A):

https://user-images.githubusercontent.com/23455639/89305390-7b306980-d666-11ea-8758-2f92e35d7063.png https://user-images.githubusercontent.com/23455639/89305494-9d29ec00-d666-11ea-9432-52ad3c473b12.png

Where the matrices D and F consist of complex entries. Up to this point, everything is the same as how our PyLops implementation performs the inversion. However, the matrix formaslism now deviate from our PyLops implementation.

The PyLops implementation now solves for s, where s is also real valued and complex valued. In contrast, the matrix formalism defines two new matrices D' = D_real + D_imag and F' = F_real + F_imag, and it solves for s using these matrices, where s now only consists of real valued entries.

I think that this is the solution we want, for the following reasons:

The original matrix y is only real valued (1's and 0's), and so if s contains imaginary values it is unclear how you map this solution back to the data when computing the chi-squared after the inversion.
Regularization is poorly defined when s is real + imag. Currently, we only regularize the real components of s (I could also regularize the imaginary components too, but its unclear to me what this actually means in terms of the prior on our solution).
In its current implmentation, PyLops is effectively finding the matrix s which indepedently fits the real values in d and the imaginary values in d. I'm of the view we want to solve for both simultaneously using the matrix formaslims.

I think the key point is that we are recosntructing a galaxy's light, which is a real space type quantity that doesn't have imaginary components associated with it. Currently, for our s in PyLops thats not quite the case.

So, my question is, would there be a straight forward way to tweak our PyLops solution to solve for the summed F and D matrices like the matrix formalsim does? Or something equivalent?

mrava87 commented 4 years ago

Hi @Jammy2211, this may be due to how NFFTU is implemented / how you wrapped it. When you use NormalEquationsInversion (or actually any PyLops and scipy solver), the first iteration is likely to be just the backprojection of your data onto the model space (or something very close to it), so if your forward is:

b=As

the first step is something like:

s_in = A.H b

Now if your operator takes as input a complex number and returns a complex number (e.g. FFT) you end up in the situation of inverting a complex-valued model vector from a complex-valued data vector. There may be scenarios where however your model is real and through the operator it gets moved to complex-valued data space. If your operator adjoint does however not carefully handle this you may end up staying all the time in the complex space. Let's make a simple example with FFT (which wraps numpy.fft.fft). This

import pylops                                                           
F = pylops.signalprocessing.FFT(2**8)               
s = np.ones(256)                           
b = F * s    
s_in = F.H * b

if you run it you will see that despite s is real s_in will be complex and the next iterations will also be complex

But if you use:

import pylops                                                           
F = pylops.signalprocessing.FFT(2**8, real=True)               
s = np.ones(256)                           
b = F *  s 
s_in = F.H * b

you see that s_in is now real and this is because by using the real=Flag we tell pylops to use np.fft.rfft and np.fft.irfft``` which is the real-valued FFT. You could do the same if NUFFT has this option or if this is not the case you could simply add anp.real()in your adjoint function (_rmatvec``) at the end before returning the output.

Here is something very simple example of what i mean done using np.fft.fft and np.fft.iffft+np.real

 def irfft(x): 
    return(np.real(np.fft.ifft(x))) 

F = pylops.FunctionOperator(np.fft.fft, irfft, 2**8)                   
s = np.ones(256)                           
b = F *  s 
s_in = F.H * b

you can see again that s_in is real

But I am not sure I follow completely how the pure matrix formulation differs from the operator one. When you say the matrix formalism define two matrices D' and F', where do the get used in the definition of the forward problem are they then used in the forward problem https://user-images.githubusercontent.com/22675848/94829071-70961480-040a-11eb-9170-4d4d6980a39e.png ?

If my first suggestion doesn't make sense to you, any chance you can show the equation formulation you use for the pure matrix formalism (I am sure we can do the same in operator formalism but I would need a bit more details ;) )

Jammy2211 commented 4 years ago

Thanks again for the speedy reply!

The matrix formaslism is detailed in this paper - https://arxiv.org/pdf/1801.01831.pdf

Omitting the real + complex nature of our problem for a second, we are solving a problem of the form:

Where:

The matrix f is what we refer to as the matrix y above which in our problem maps image pixels to source pixels. This is the matrix that is transformed via the NUFFT (and in the equations above we assume it has already been transformed). The data values d_j are the data values and \sigma values the noise entries.

With this addition of our regularization matrix this becomes:

The merit function we are solving for is:

Including regularization this becomes:

Now let us write the problem for visibility data V, which consists of separate real and imaginary components:

For our problem we solve for the following solution:

Finally, the solution we solve for is now (I've including text from the paper stating why S is real only):

The matrix implementation of this problem literally makes the matrices D and F separately right up until we call np.linalg.solve(), but before we do we all them together as shown in the equation above. I can expand on this further if needs me.

Correct me if I'm wrong, but I do not think that adding an np.real() to the adjoint method will produce the desired behaviour. This will remove all imaginary entries from f after the NUFFT is performed, and thus only solve for the set of s values that best reconstruct the real entries in our visibility dataset (the real + imaginary values are passed to PyLops and I guess this would mean the method has no means by which to fit the imaginary values at all).

mrava87 commented 4 years ago

Hi again, thanks a lot for the details, let me digest them.

Regarding your last point, note that I suggested to take the real part in the adjoint NUFFT (when you move from the data vector - luminosity dataset - into what you call the matrix y that maps source pixels to image pixels which i seem to understand is purely real) not the the forward NUFFT (when you go from y to the luminosity). In that case I totally agree your modelling operator will not produce complex numbers but your observed data will have them, and i have no idea what the inversion would produce... probably rubbish! Does that make more sense (even thought perhaps does not give you what you want...)?

Jammy2211 commented 4 years ago

Aha, I hadn't quite clocked that the real would be on the adjoint but not the forward method, so we would still be mapping our source reconstruction from real-space to complex space when computing the solution.

That makes a lot of sense, and I am experimenting with it now. It certainly addresses my first major concern with the previous setup, whereby I had to take the real components of the final solution for s to create my model reconstruction of the visibility data (from which I compute the likelihood of the fit). By discarding the imaginary components this meant the likelihood was poorly defined!

After doing a more quantitive comparison to the matrix solutions (for datasets where the number of visiblitiies is small) it would seem things are looking very similar! So, I think that was the key bit of insight I'd been missing :). Its not clear to me yet whether the matrix and PyLops implementations are doing exactly the same thing (I really haven't wrapped my head around equations 14 and 15 above), but I think thats ok. I think its time to do a more thorough comparison via actual model-fitting - slowly getting there!

mrava87 commented 4 years ago

Hi, I looked again at our equations and my opinion is that we are solving the same problem.... but I could be wrong ;)

So, Eq

is a trivial analytic solution of the L2 functional

which, as I guess you are familiar with, you can derive by taking the derivative of the functional with respect to S.

Now anything after is just what I would call ad-hoc analytic similification of the equations so that you can try to do less numerical computations but get the same result. But I do think that if you solved the problem with complex data and complex matrix you would get the same (apart from the fact that you have no way - at least not that I can think of - to force your solution to be real... the only way would be to get directly into the solver mauve at the end of every iteration with a callback - like for example Scipy solvers allow you to - and enforce a real solution.

With linear operators you can enforce reality much more easily as we discussed but I think that the solution should be the same (provided your modelling operator is the same as its explicit counterpart...

Keep me posted if more tests convince you that this is the case or not :)

Jammy2211 commented 4 years ago

I think you're right! In fact, after a long period of grueling testing and comparison, I think me and my collaborator have managed to get essentially the same result when comparing solutions using the matrix formalism and via PyLops :) (this relies on us taking the real on the adjoint but not forward, as you suggested). The definition of the noise-map is currently inconsistent between the two in a way we don't yet understand, but this should be something a bit more digging will address.

The final step for us now is getting the preconditioner to speed things up. So far, the preconditioner matrix discussed above is doing little to nothing to help us (PyLops takes ~150 iterations with or without it). However, if we use the 'true' A as the precondtioner (which we can compute using the matrix formalism code) this drops to 4 iterations! So. effective precondtioning can give us an insane speed up, we just need to think about how to get this.

The precondtioner matrix P above does a great job at matching the diagonal terms of A, but pretty much puts zeros in all off-diagonal terms (it doesn't use the NUFFT to create P and so has no information on how different elements in y are correlated). Me and my collaborator are going to draft up some approaches on how to improve the preconditioning tomorrow but so far the sort of things we're thinking are:

We could create an extremely low dimenstionality NUFFT that represents the 'average' behaviour of the full NUFFT. This might operate on ~1000 visibilities instead of >10 million. This could then be used to cheaply compute A via the matrix formaslism, which would act as the precondtioner for a PyLops fit to the full data using the full NUFFT.
The analysis is embedded in a non-linear search fitting the mass distribution of a lens galaxy. We could compute A using the full dataset once for a previously estimated 'good' mass distribution model and use that as the precondtioner for fits which change the mass distribution. This would work in a settings where our model fitting does not change the mass distribution by a lot (which turns out to be the majority of the analysis).

If you have any suggestions please let us know, but it feels like we're very very nearly there and can finally begin to use PyLops to do some science!

mrava87 commented 4 years ago

Great :)

I think what you say about the preconditioned makes sense, as the closer the preconditioner to the inverse the faster then inverse... so perhaps your off-diagonal terms are very strong and have a big role... I like your idea of using a a ‘sparse’ NUFFT, would be good to see how good it can approximate the preconditioner :) if I get any other idea I will let you know

And, you say that using the correct A matrix within the PyLops inversion speeds things up? But this is not the case when using an approx A?