Can FISTA run on matrices of data?

eurunuela commented 2 years ago

Hello,

I am not sure this is the right place to ask this. I would love to add pylops to my algorithms, but I mainly work with multivariate formulations, i.e., with the entire matrix of data at once.

Does pylops.optimization.sparsity.FISTA accept an np.ndarray of size 200,10 for instance? I have tried it but cannot make it work.

If I can use FISTA like that, I'll definitely want to contribute to add a new operator.

Thank you!

mrava87 commented 2 years ago

Hi @eurunuela, currently it will not work as internally we call matvec and rmatvec but it should be an easy change to make it work for multiple right-hand sides at the same time :)

I'll take a look and let you know how it goes soon

eurunuela commented 2 years ago

Thank you @mrava87 !

I thought it would be straightforward too, just making sure that the matrix dimensions make sense for matcvec.

mrava87 commented 2 years ago

Ok, this was easier than I expected.

I just pushed updated ISTA/FISTA on the master. Try:

nt = 61
nx = 5
dt = 0.004
t = np.arange(nt)*dt
x = np.zeros(nt)
x[10] = -.4
x[int(nt/2)] = 1
x[nt-20] = 0.5
x = np.outer(x, np.ones(nx))

h, th, hcenter = pylops.utils.wavelets.ricker(t[:101], f0=20)

Cop = pylops.signalprocessing.Convolve1D(nt, h=h, offset=hcenter,
                                         dtype='float32')
y = Cop*x
yn = y + np.random.normal(0, 0.1, y.shape)

xista, niteri, costi = \
    pylops.optimization.sparsity.ISTA(Cop, y, niter=400, eps=5e-1,
                                      tol=1e-8, returninfo=True)

xfista, niteri, costi = \
    pylops.optimization.sparsity.FISTA(Cop, y, niter=400, eps=5e-1,
                                       tol=1e-8, returninfo=True)

fig, axs = plt.subplots(1, 4, figsize=(14, 5))
axs[0].imshow(x)
axs[0].axis('tight')
axs[1].imshow(y)
axs[1].axis('tight')
axs[2].imshow(xista)
axs[2].axis('tight')
axs[3].imshow(xfista)
axs[3].axis('tight')

This is a simple example of deconvolution using multiple 1d signals as multiple RHS. Have a look and see if this is what you need for your problem :)

eurunuela commented 2 years ago

Yes, this is perfect. Thank you so much Matteo!

If you think it's useful I will implement a L_1 + L_2,1 mixed-norm proximal operator that allows for sparsity and grouping of coefficients.

mrava87 commented 2 years ago

For sure, that would be great to have :)

I would however more likely see this fitting in https://github.com/pylops/pyproximal

Whilst it is true we have FISTA directly in PyLops, we decided that more advanced solvers which fit within the arena of Proximal operator would be better off living in their own project. This way if you implement a proximal operator you could then use it with ISTA/FISTA but also ADMM or Primal-dual

Makes sense?

eurunuela commented 2 years ago

Of course, that makes sense.

For the record, I'm referring to equation 3 in Gramfort, A., Strohmeier, D., Haueisen, J., Hamalainen, M., & Kowalski, M. (2011, July). Functional brain imaging with M/EEG using structured sparsity in time-frequency dictionaries. In Biennial International Conference on Information Processing in Medical Imaging (pp. 600-611). Springer, Berlin, Heidelberg.

Edit: thank you for you help @mrava87 !

mrava87 commented 2 years ago

Ok, I planned to do this for a while but since it seems to fit nicely with your needs I just decided to spend a bit of time and get it done.

I have now modified the AcceleratedProximalGradient (https://pyproximal.readthedocs.io/en/latest/api/generated/pyproximal.optimization.primal.AcceleratedProximalGradient.html#pyproximal.optimization.primal.AcceleratedProximalGradient) to allow for 2 type of accelleration, one of them being the FISTA one. So now, whilst PyLops FISTA is purely for L2 + reg*L1, AcceleratedProximalGradient with acceleration='fista' is much more general as it can handle any pair of functionals, one with gradient and one with proximal.

It also works in multiple RHS mode, see here for an example (top part proves that ISTA/FISTA in PyLops are identical to ProximalGradient and AcceleratedProximalGradient when the latter used for L2 + reg*L1): https://github.com/PyLops/pylops_notebooks/blob/master/pyproximal/ISTA_FISTA_PyLops_PyProx.ipynb

Note: so far you would need to use both PyLops and PyProximal in master, I will soon make releases for both :)

eurunuela commented 2 years ago

Thank you Matteo. I see there's a typo in the documentation of the AcceleratedProximalGradient. You mention f(x) twice after the equation.

I will have a look at these next week. Thank you so much, this is all very helpful and it will definitely optimize my computational time.

mrava87 commented 2 years ago

Thanks, I'll fix that :)

eurunuela commented 2 years ago

Hi @mrava87,

I finally found some time to look into this and I'm finding a dimension mismatch error when creating the L2 operator:

tau = 1 / (np.linalg.norm(hrf_matrix) ** 2)
sigma = np.repeat(np.array([0.73499711]), nv, axis=0)
l21_l1 = L21_plus_L1(sigma=sigma)
l2 = L2(Op=hrf, b=y_noise)
AcceleratedProximalGradient(l2, l21_l1, tau=tau, x0=np.zeros((nt, nv)), 
                                                  niter=400, acceleration='fista', show=True)

With hrf.shape = (300, 300) and y_noise.shape = (300, 10).

Here's the entire error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-81-d032dbb1e363> in <module>
      2 sigma = np.repeat(np.array([0.73499711]), nv, axis=0)
      3 l21_l1 = L21_plus_L1(sigma=sigma)
----> 4 l2 = L2(Op=hrf, b=y_noise)
      5 AcceleratedProximalGradient(l2, l21_l1, tau=tau, x0=np.zeros((nt, nv)), 
      6                             niter=400, acceleration='fista', show=True)

~/pyproximal/pyproximal/proximal/L2.py in __init__(self, Op, b, q, sigma, alpha, qgrad, niter, x0, warm)
     96         # create data term
     97         if self.Op is not None:
---> 98             self.OpTb = self.sigma * self.Op.rmatvec(self.b)
     99 
    100     def __call__(self, x):

~/miniconda3/lib/python3.8/site-packages/pylops/LinearOperator.py in rmatvec(self, x)
    159 
    160         if x.shape != (M,) and x.shape != (M, 1):
--> 161             raise ValueError('dimension mismatch')
    162 
    163         y = self._rmatvec(x)

ValueError: dimension mismatch

What am I doing wrong to make FISTA work on a matrix of data in AcceleratedProximalGradient?

Thank you!

mrava87 commented 2 years ago

Hi @eurunuela, I am not sure as I cannot see the entire code. I tried to make something simple along the lines of your dimensions:

hrf_matrix = np.random.normal(0,1, (300, 300))
hrf = MatrixMult(hrf_matrix)
y_noise = np.random.normal(0,1, (300, 10))

l1 = L21_plus_L1(np.arange(10))
l2 = L2(Op=hrf, b=y_noise)
epsg = np.arange(10)+1

xpgfista = AcceleratedProximalGradient(l2, l1, tau=tau, x0=np.zeros((300,10)), 
                                       epsg=epsg, niter=400, acceleration='fista', show=False)

This seems to work no problem. Note that work means here that the code runs... as I don't have a real problem to look at I am not sure if things work as expected but at least for the L1 case in here https://github.com/PyLops/pylops_notebooks/blob/master/pyproximal/ISTA_FISTA_PyLops_PyProx.ipynb the results are also reasonable.

PS: just to be sure, which version are you using? I am on currently running this on main directly

eurunuela commented 2 years ago

I'm sorry, I wasn't on the main branch. However, now that I have tried on the main branch, I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-12f957443d1d> in <module>
      3 l21_l1 = L21_plus_L1(sigma=sigma)
      4 l2 = L2(Op=hrf, b=y_noise)
----> 5 fista_results = AcceleratedProximalGradient(l2, l21_l1, tau=tau, x0=np.zeros((nt, nv)), 
      6                             epsg=np.ones(nv), niter=400, acceleration='fista', show=True)

~/pyproximal/pyproximal/optimization/primal.py in AcceleratedProximalGradient(proxf, proxg, x0, tau, beta, epsg, niter, niterback, acceleration, callback, show)
    300     if show:
    301         tstart = time.time()
--> 302         print('Accelerated Proximal Gradient\n'
    303               '---------------------------------------------------------\n'
    304               'Proximal operator (f): %s\n'

TypeError: only size-1 arrays can be converted to Python scalars

mrava87 commented 2 years ago

Strange, seems a printing error. Can you give me a code snipped with fake numbers but same dimensions to what you are tying to do, otherwise it is difficult for me to help

eurunuela commented 2 years ago

Sure thing, here's all the code I'm using:

from splora.deconvolution.hrf_matrix import hrf_linear

nt = 300
nv = 10
tr = 2

y_ideal = np.zeros((nt, nv))

y_ideal[50:51, :] = 1
y_ideal[200:205, :] = 1
y_ideal[250:252, :] =1

nt = 300
p = [6,16,1,1,6,0,32]
hrf_SPM = hrf_linear(2, p)

filler = np.zeros(nt - hrf_SPM.shape[0], dtype=int)
hrf_SPM = np.append(hrf_SPM, filler)

temp = hrf_SPM

for i in range(nt - 1):
    foo = np.append(np.zeros(i + 1), hrf_SPM[0 : (len(hrf_SPM) - i - 1)])
    temp = np.column_stack((temp, foo))

hrf_matrix = temp

hrf = pylops.MatrixMult(hrf_matrix)
#hrf = pylops.signalprocessing.Convolve1D(N=nt, h=hrf_SPM)

y = hrf.dot(y_ideal)

y_noise = y + np.repeat(np.random.normal(0, 0.2, y.shape[0])[:, np.newaxis], nv, axis=1)

plt.plot(y_noise[:, 0])

y = y_noise.copy()

tau = 1 / (np.linalg.norm(hrf_matrix) ** 2)
sigma = np.repeat(np.array([0.73499711]), nv, axis=0)
l21_l1 = L21_plus_L1(sigma=sigma)
l2 = L2(Op=hrf, b=y_noise)
fista_results = AcceleratedProximalGradient(l2, l21_l1, tau=tau, x0=np.zeros((nt, nv)), 
                            epsg=np.ones(nv), niter=400, acceleration='fista', show=True)

You can get the hrf_linear function from here.

Thank you Matteo!

mrava87 commented 2 years ago

Alright, I see. This is a printing problem as epsg was initially not supposed to be multi-valued. You could have just switched show=False and the solver runs (again, not sure if what it gives is meaningful...). I made some changes now (https://github.com/PyLops/pyproximal/pull/85) so that also the printing works fine :)

Let me know when you get some results if they are meaningful, and perhaps at some point it would be good if you can contribute a tutorial on pyproximal

eurunuela commented 2 years ago

Thank you so much!

I can confirm it works and yields results that are very similar to my own implementation of FISTA with differences probably arising from faster convergence with one of the two methods.

mrava87 commented 2 years ago

Fantastic! As I said, if you can make a simple example to show how you use this for a real life problem that would be very nice 😊

eurunuela commented 2 years ago

Fantastic! As I said, if you can make a simple example to show how you use this for a real life problem that would be very nice 😊

Will do once I add it to my package, because I will have to do it for myself too.

eurunuela commented 2 years ago

PS: just to be sure, which version are you using? I am on currently running this on main directly

Have you cut a release with the changes we have discussed in this issue?

My unit tests fail with an error that makes me think these changes are not part of a release.

/opt/conda/envs/py37_env/lib/python3.7/site-packages/pyproximal/optimization/primal.py:318: in AcceleratedProximalGradient
    x = proxg.prox(y - tau * proxf.grad(y), epsg * tau)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = (<pyproximal.proximal.L21_plus_L1.L21_plus_L1 object at 0x7f1e74e2a510>, array([[-1.60410489e-05, -1.60410489e-05, -1....9, 0.00085759, 0.00085759, 0.00085759, 0.00085759,
       0.00085759, 0.00085759, 0.00085759, 0.00085759, 0.00085759]))
kwargs = {}

    def wrapper(*args, **kwargs):
>       if args[2] <= 0:
E       ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

/opt/conda/envs/py37_env/lib/python3.7/site-packages/pyproximal/ProxOperator.py:12: ValueError

It could also be that I'm not signaling the correct version of pyproximal and pylops.

mrava87 commented 2 years ago

Currently the situation is a bit messy as we are doing a major new realease of pylops.

I suggest to use the latest pylops on PyPI and the main of pyproximal. I am planning to make a new release of pyproximal soon as there is quite a lot of new things but I’m not sure I will get around doing it before the end of the month

eurunuela commented 2 years ago

I suggest to use the latest pylops on PyPI and the main of pyproximal. I am planning to make a new release of pyproximal soon as there is quite a lot of new things but I’m not sure I will get around doing it before the end of the month

Sounds good. I will test locally until the new pyproximal release is out.

Thanks!

eurunuela commented 2 years ago

Quick question @mrava87:

How long would you expect AcceleratedProximalGradient to take on a 2100x45 matrix with a maximum of 400 iterations on an HPC cluster?

Edit: the method should check for convergence and exit the loop when the criterion is met.

mrava87 commented 2 years ago

Hey, It depends on many things because I assume for the proxf you use L2 and that calls an iterative solver (or a dense solver) so the main cost of the overall problem will be dominated by the cost of this part. What do you use? And how long does it take?

Also, when you say HPC cluster what do you mean, as there is no distribution over machines unless you add it yourself so I guess you just mean a good beefy machine?

For the moments our methods don’t do checks so they always stop only when they reach the max number of iterations, we should include that soon when we convert them to classes like we just did for pylops (so we also give the freedom to the user to choose the stopping criterion)

mrava87 commented 2 years ago

PS: we just released pyproximal 0.4.0 so all things we discussed are now in a stable release too :)

eurunuela commented 2 years ago

Hey, It depends on many things because I assume for the proxf you use L2 and that calls an iterative solver (or a dense solver) so the main cost of the overall problem will be dominated by the cost of this part. What do you use? And how long does it take?

I have reduced the number of maximum iterations to 200 just to make sure the entire pipeline runs correctly on the HPC cluster. Right now, solving the inverse problem for ~300 datasets of size 2100 x 45 takes around 5h. It could be slow because the regularization parameter is not high enough. But I still haven't been able to run the entire pipeline and see the results to assess if the regularization parameter is low or high.

Also, when you say HPC cluster what do you mean, as there is no distribution over machines unless you add it yourself so I guess you just mean a good beefy machine?

So, I'm spitting the computation into various jobs in the HPC cluster, with each job having access to 8GB of RAM memory and 16 CPUs. Each one of the jobs solves the inverse problem on 300 different matrices of size 2100 x 45.

For the moments our methods don’t do checks so they always stop only when they reach the max number of iterations, we should include that soon when we convert them to classes like we just did for pylops (so we also give the freedom to the user to choose the stopping criterion)

Sounds good!

On a related note, it looks like the scipy dependencies of the latest pylops and pyproximal versions are different. This raises an error when having these two versions as a dependency of my package.

pylops 1.18.2 depends on scipy>=1.4.0
pyproximal 0.4.0 depends on scipy>=1.8.0

mrava87 commented 2 years ago

Alright I see. Then I think it makes sense as you do solve a lot of problems. Given that you use Fista the only heavy computation is the matrix multiplication and its adjoint which is done in numpy, so as long as you set the various env variables of OMP/MKL to use as many threads as possible (I guess you mean 16 threads not CPUs as otherwise you would need some sort of MPI parallelism to use them…) I don’t see any way to make it faster… other than managing to cut down iterations by finding the best regularization parameter.

For the dependencies, there is no conflict as both require a >= so if you want both packages the requirement of pyproximal should dominate… what error you get? Or just a warning?

eurunuela commented 2 years ago

I'm using Slurm, so the CPUs argument refers to a consumable resource offered by a node. It can refer to a socket, a core, or a hardware thread, based on the Slurm configuration.

I will check if numpy is already using the OMP variable.

Re dependencies: you're right, but it looks like it scipy>=1.8.0 is only compatible with Python 3.8 or higher. I will have to drop support for 3.6 and 3.7.

mrava87 commented 2 years ago

Ok great! I expected you meant processes/threads, then make sure numpy is aware. Other than that, since you seem to use dense matrices you could set densesolver=True in L2 and see if that helps. Another thing I could look into is to provide an option if the operator is dense to create its inverse / Cholesky upfront and store it so that at every iteration the cost of solving the problem is very minimal. What do you think?

Regarding debs, that’s correct. Let me take a look why we had to push the dependency of scipy for pyproximal to 1.8.0

eurunuela commented 2 years ago

Other than that, since you seem to use dense matrices you could set densesolver=True in L2 and see if that helps.

The docs say densesolver should be a string.

Another thing I could look into is to provide an option if the operator is dense to create its inverse / Cholesky upfront and store it so that at every iteration the cost of solving the problem is very minimal. What do you think?

That sounds like a great idea!

Regarding debs, that’s correct. Let me take a look why we had to push the dependency of scipy for pyproximal to 1.8.0

Don't worry about it too much. I'm ok with dropping support for Python 3.6 and 3.7.

mrava87 commented 2 years ago

Sorry, it is indeed a string to choose numpy/scipy when the operator is explicit.

Ok, let me take a look at how we can implement the factorization option, I get back to you soon :)

eurunuela commented 2 years ago

Sorry, it is indeed a string to choose numpy/scipy when the operator is explicit.

Just to confirm, I have to use densesolver="numpy" then?

Ok, let me take a look at how we can implement the factorization option, I get back to you soon :)

Thank you!

mrava87 commented 2 years ago

Numpy is the default but someone asked to include also scipy as it was faster for his problem. I would try both and see, I cannot tell as I think depends on the pronlem

mrava87 commented 2 years ago

Alright I looked into the factorization option and I think I found a nice solution: https://github.com/PyLops/pyproximal/pull/89

Initially I thought you could factorize a matrix A and then easily find the factorization of (A+I) but this does not seem to be possible. So I decided to factorize and keep track of the scalars (sigma and tau) so that if they dont change the factorization does not need to be done again at the next iteration. See here for an example:https://github.com/PyLops/pylops_notebooks/blob/master/pyproximal/CompressiveFactorize.ipynb

However note that this does not really apply to ProximalGradient as there is no inversion as such there... so for your problem I currently don't see any way to further speed up the process (it seems to be it is purely bounded by the matrix-vector computations of numpy)

eurunuela commented 2 years ago

Alright I looked into the factorization option and I think I found a nice solution: https://github.com/PyLops/pyproximal/pull/89

Initially I thought you could factorize a matrix A and then easily find the factorization of (A+I) but this does not seem to be possible. So I decided to factorize and keep track of the scalars (sigma and tau) so that if they dont change the factorization does not need to be done again at the next iteration. See here for an example:https://github.com/PyLops/pylops_notebooks/blob/master/pyproximal/CompressiveFactorize.ipynb

If that makes the code faster, I'm all in for it. I'm have a look at the PR and the notebook throughout the week.

However note that this does not really apply to ProximalGradient as there is no inversion as such there... so for your problem I currently don't see any way to further speed up the process (it seems to be it is purely bounded by the matrix-vector computations of numpy)

We could wrap the function with numba's jit maybe? I don't know if the function is too complex for it though, as not every numpy method is supported.

mrava87 commented 2 years ago

Mmh I am not sure numba can do much. Iterative solvers are sequential in nature so even if you have a for loop over iterations numba can’t do much as you need the previous iteration to get the next. Usually my experience is that speed up can be mostly achieved at the operator level, by speeding up the linear or proximal operator involved in your problem. If you are not satisfied with the time it takes to run your experiment I would suggest trying to profile your code and see where it is slow (because if most of the time is spent doing matrix-vector multiplications then there is little you can do other than change hardware - use accelerators :) )

PyLops / pylops

Can FISTA run on matrices of data? #264