Suggested improvement for api/generated/pylops.optimization.sparsity.SplitBregman

kakila commented 3 years ago

In the formulation of the objective functional, the L1 regularization dampings do not appear. They also are not in the constrained problem. They appear only on the iteration algorithm. Is this correct?

mrava87 commented 3 years ago

Hi, thanks for the issue. Yes, this is correct. Please refer to the original paper (eg, eq 4.1 and next two equations)

kakila commented 3 years ago

So the initial formulation of the problem does not have a knob to control the L1 regularization, but the modified one does. Can one not just re-write the original formulation where the weighting of L1 appears? Or put it other way around. How do you select the weight for the L1 regularization if it is not in the original problem formulation?

mrava87 commented 3 years ago

This is a good question, which goes a bit beyond Pylops scope. Whilst I agree it makes it generally a bit harder to choose this weighting factor, this is what the authors of the original paper came up with. We don’t want our implementation to diverge from the paper (unless you think we are doing something inconsistent with the paper, in that case please let us know and we will be happy to change it accordingly :) ).

Technically speaking you could have a weighing on the original term but then because of the splitting strategy you will also have one more appearing, so maybe they just didn’t want to have too many.

PS: may I ask which kind of problem you are trying to solve with Split-Bergman. Whilst this is a powerful solver, I found out that for some of the problems I work with other solvers of the family of proximal solvers are slightly more performant. You may want to take a look at this sister library we recently created https://github.com/PyLops/pyproximal

mrava87 commented 3 years ago

This solver, that was recently developed by a colleague, may actually do what you want: https://pyproximal.readthedocs.io/en/latest/api/generated/pyproximal.optimization.sr3.SR3.html#pyproximal.optimization.sr3.SR3

Even though note that a second knob will always appear to relax the intermediate constrained problem.

kakila commented 3 years ago

Thanks for the detailed answer. i will definitely will check that information. I wanted to compare SPlitBergman with Lagged Diffusivity Fixed Point Method, implemented to obtained the total variation regularized derivative directly form the data (in plops, the TVR example recovers the function, not the derivative). I was trying to reproduce the results from the 2011 publication https://www.hindawi.com/journals/isrn/2011/164564/ I noticed that SplitBergman is super sensitive to the two weights, and the results is really hard to control. The original implementation of Diffusion TVR is very robust (https://github.com/smrfeld/Total-Variation-Regularization-Derivative-Python/tree/main/python). I implemented the algorithm using pylops, but still can't reproduce the results. I think it is due to the structure of the derivative and integral operators in pylops, but I haven't spotted the issue yet. Below is some code I developed, it is still drafty, but maybe interest you. (btw, it would be great to have Galerkin based derivative operators or FD with boundary conditions, will post if I manage to implement one)

Trapezoidal integration operator

import numpy as np

from scipy import integrate
from scipy import optimize

from functools import partial

import pylops

class TrapezoidalIntegration(pylops.LinearOperator):
    r"""Integration using trapezoidal rule.

    Apply trapezoidal integration to a multi-dimensional array along ``dir`` axis.

    """
    def __init__(self, N, dims=None, dir=-1, sampling=1.0, dtype='float64'):
        super().__init__()
        self.N = N
        self.dir = dir
        self.sampling = sampling
        if dims is None:
            self.dims = [self.N, 1]
            self.reshape = False
        else:
            if np.prod(dims) != self.N:
                raise ValueError('product of dims must equal N!')
            else:
                self.dims = dims
                self.reshape = True
        self.shape = (self.N, self.N)
        self.dtype = np.dtype(dtype)
        self.explicit = False
        self._integrator = partial(integrate.cumtrapz, dx=self.sampling, axis=-1,
                                   initial=0)

    def _matvec(self, x):
        if self.reshape:
            x = np.reshape(x, self.dims)
        if self.dir != -1:
            x = np.swapaxes(x, self.dir, -1)
        y = self._integrator(x)
        if self.dir != -1:
            y = np.swapaxes(y, -1, self.dir)
        return y.ravel()

    def _rmatvec(self, x):
        if self.reshape:
            x = np.reshape(x, self.dims)
        if self.dir != -1:
            x = np.swapaxes(x, self.dir, -1)
        xflip = np.flip(x, axis=-1)
        y = self._integrator(xflip)
        y = np.flip(y, axis=-1)
        if self.dir != -1:
            y = np.swapaxes(y, -1, self.dir)
        return y.ravel()

Diffusion TVR

class DiffTVR:

    def __init__(self, dx: float, *, n: int = None):
        """Differentiate with TVR.
        Args:
            dx (float): Spacing of data.
            n (int): Number of points in data.
        """
        self.Dop = None
        self.CIop = None
        self.eps = 1e-10

        self._n = n
        self._dx = dx

        if self._n is not None:
            self.build_matrices()

    @property
    def n(self): return self._n

    @n.setter
    def n(self, value):
        self._n = value
        self.build_matrices()

    @property
    def dx(self): return self._dx

    @dx.setter
    def dx(self, value):
        self._dx = value
        if self._n is not None:
            self.build_matrices()

    def build_matrices(self):
        self.Dop = pylops.FirstDerivative(self._n, kind='centered', edge=True,
                                          sampling=self._dx)
        self.CIop = TrapezoidalIntegration(self._n, sampling=self._dx)

    def __call__(self, data: np.array, alpha: float, *,
                 initial_guess: np.array, steps: int,
                 dx: float = None) -> np.array:
        """Get derivative via TVR over optimization steps
        Args:
            data (np.array): Data 
            initial_guess (np.array): Guess for derivative
            alpha (float): Regularization parameter
            steps (int): No. opt steps to run
        Returns:
            np.array: estimated derivative
            np.array: integrated derivative
        """
        self._dx = dx if dx is not None else self.dx
        self.n = data.size  # triggers update of matrices

        KTK = self.CIop.T * self.CIop
        deriv_curr = initial_guess
        for s in range(0, steps):
            # Compute update
            en_diag = self._dx / np.sqrt((self.Dop * deriv_curr) ** 2 + self.eps)
            hnOp = KTK + alpha * self.Dop.T * pylops.Diagonal(en_diag) * self.Dop
            b = hnOp * deriv_curr - self.CIop.T * data
            update = - hnOp / b

            # Update solution
            deriv_curr += update

        # Estimate integral
        func = self.CIop * deriv_curr
        return deriv_curr, func

Example

    import numpy as np
    import pylops

    nx = 100
    x = np.linspace(0, 1, nx)
    dx = x[1] - x[0]

    Sop = TrapezoidalIntegration(nx, sampling=dx)
    Iop = pylops.Identity(nx)
    Dop = pylops.FirstDerivative(nx, edge=True, kind='centered', sampling=dx)

    rnd = np.random.default_rng(12345)
    y0 = np.abs(x - 1/2)
    dy0 = np.where(x < 1/2, -1, 1)
    sy = 0.05
    y = Iop * (y0 + rnd.normal(0, sy, nx))

    mu = 4e4
    niter_out = 100
    niter_in = 5
    dy_l1, niter = SplitBregman(Sop, [Dop], y - y[0],
                                niter_out, niter_in,
                                mu=mu,
                                epsRL1s=[5],
                                tol=1e-3,
                                **dict(iter_lim=30, damp=1e-5)
                                )
    y_l1 = Sop * dy_l1 + y[0]
    print(f'std resid L1: {(y - y_l1).std()}')

    diff_tvr = DiffTVR(dx)
    dy_tv, y_tv = diff_tvr(data=y - y[0], initial_guess=np.zeros(nx), alpha=2e-3, steps=100)
    y_tv += y[0]
    print(f'std resid TV: {(y - y_tv).std()}')

mrava87 commented 3 years ago

Nice :)

Indeed the tutorials in our documentation use TV to recover a blocky signal, there is no example to recover the derivative of the signal. Let me take a look at the paper you linked but the idea of casting derivative as the inverse of causal integration is logical and should work with pylops :) give me a couple of days to look at your codes and the paper to see if I can spot anything that explains the problem you have. One thing that we realized thanks to another user is that TV likes forward/backward derivatives whilst when using central you get some unwanted ringing in the solution. That may be a quick thing to try and see if your results improve after that change :)

And you are right: so far our Derivative/Integration operators do not accommodate for any type of boundary conditions, it would definitely be a great contribution!

kakila commented 3 years ago

Yes, the casted problem is what i implemented in these lines

dy_l1, niter = SplitBregman(Sop, [Dop], y - y[0], ...)

Sop is the causal integral (trapezoidal rule in my case) and Dop is the derivative for the L1 norm

kakila commented 3 years ago

I think the like of froward derivatives has not to do with TV but with SplitBerman. The iteration fixed point implemented in the repository I linked uses central derivatives (with extra point outside domain to improve boundary value). It is very robust and works well. I also tried using operators of the form 0.25 * R.H D R, where R is the symmetrization operator to improve boundary values of the derivative, but it did not improve the behavior of SplitBergman

mrava87 commented 3 years ago

Hi, I think I have a few ideas why both of your pylops codes don't work as well as you expected.

Every time you write a dense matrix, like in the Diffusion TVR you pointed me at, the adjoint is very easy to obtain, just transpose the matrix. But when you write linear operators this is not the case anymore. I tried to check if your TrapezoidalIntegration passes the dottest (https://pylops.readthedocs.io/en/latest/api/generated/pylops.utils.dottest.html#pylops.utils.dottest) and I see this is not the case. This means that you don't have any guarantee that iterative solvers would converge as they are now working with a broken pair of forward-adjoint. This is I think due to the fact that in the adjoint you don't handle properly the fact that the first sample in trapezoidal integration is summed with a 0.5 scaling in the forward. I modified the original CausalIntegration (not yet pushed to the main repo):

import numpy as np
from pylops import LinearOperator

class CausalIntegration(LinearOperator):
    r"""Causal integration.

    Apply causal integration to a multi-dimensional array along ``dir`` axis.

    Parameters
    ----------
    N : :obj:`int`
        Number of samples in model.
    dims : :obj:`list`, optional
        Number of samples for each dimension
        (``None`` if only one dimension is available)
    dir : :obj:`int`, optional
        Direction along which smoothing is applied.
    sampling : :obj:`float`, optional
        Sampling step ``dx``.
    halfcurrent : :obj:`bool`, optional
        Add half of current value (``True``) or the entire value (``False``)
    trapezoidal : :obj:`bool`, optional
        Apply trapezoidal rule (``True``) or not (``False``)
    dtype : :obj:`str`, optional
        Type of elements in input array.

    Attributes
    ----------
    shape : :obj:`tuple`
        Operator shape
    explicit : :obj:`bool`
        Operator contains a matrix that can be solved explicitly (``True``)
        or not (``False``)

    Notes
    -----
    The CausalIntegration operator applies a causal integration to any chosen
    direction of a multi-dimensional array.

    For simplicity, given a one dimensional array, the causal integration is:

    .. math::
        y(t) = \int x(t) dt

    which can be discretised as :

    .. math::
        y[i] = \sum_{j=0}^i x[j] dt

    or

    .. math::
        y[i] = (\sum_{j=0}^{i-1} x[j] + 0.5x[i]) dt

    or

    .. math::
        y[i] = (\sum_{j=1}^{i-1} x[j] + 0.5x[0] + 0.5x[i]) dt

    where :math:`dt` is the ``sampling`` interval. In our implementation, the
    choice to add :math:`x[i]` or :math:`0.5x[i]` is made by selecting
    the ``halfcurrent`` parameter and the choice to add :math:`x[0]` or
    :math:`0.5x[0]` is made by selecting the ``trapezoidal`` parameter.

    Note that the integral of a signal has no unique solution, as any constant
    :math:`c` can be added to :math:`y`, for example if :math:`x(t)=t^2` the
    resulting integration is:

    .. math::
        y(t) = \int t^2 dt = \frac{t^3}{3} + c

    If we apply a first derivative to :math:`y` we in fact obtain:

    .. math::
        x(t) = \frac{dy}{dt} = t^2

    no matter the choice of :math:`c`.

    """
    def __init__(self, N, dims=None, dir=-1, sampling=1,
                 halfcurrent=True, trapezoidal=False, removefirst=False,
                 dtype='float64'):
        self.N = N
        self.dir = dir
        self.sampling = sampling
        self.trapezoidal = trapezoidal
        self.halfcurrent = halfcurrent if not trapezoidal else False
        self.removefirst = removefirst
        if dims is None:
            self.dims = [self.N, 1]
            self.reshape = False
        else:
            if np.prod(dims) != self.N:
                raise ValueError('product of dims must equal N!')
            else:
                self.dims = dims
                self.reshape = True
        self.shape = (self.N-self.dims[self.dir] if self.removefirst else self.N,
                      self.N)
        self.dtype = np.dtype(dtype)
        self.explicit = False

    def _matvec(self, x):
        if self.reshape:
            x = np.reshape(x, self.dims)
        if self.dir != -1:
            x = np.swapaxes(x, self.dir, -1)
        y = self.sampling * np.cumsum(x, axis=-1)
        if self.halfcurrent or self.trapezoidal:
            y -= self.sampling * x / 2.
        if self.trapezoidal:
            y[1:] -= self.sampling * x[0] / 2.
        if self.removefirst:
            y = y[1:]
        if self.dir != -1:
            y = np.swapaxes(y, -1, self.dir)
        return y.ravel()

    def _rmatvec(self, x):
        if self.reshape:
            x = np.reshape(x, self.dims)
        if self.removefirst:
            x = np.insert(x, 0, 0, axis=self.dir)
        if self.dir != -1:
            x = np.swapaxes(x, self.dir, -1)
        xflip = np.flip(x, axis=-1)
        if self.halfcurrent:
            y = self.sampling * (np.cumsum(xflip, axis=-1) - xflip / 2.)
        elif self.trapezoidal:
            y = self.sampling * (np.cumsum(xflip, axis=-1) - xflip / 2.)
            y[-1] = self.sampling * np.sum(xflip, axis=-1) / 2.
        else:
            y = self.sampling * np.cumsum(xflip, axis=-1)
        y = np.flip(y, axis=-1)
        if self.dir != -1:
            y = np.swapaxes(y, -1, self.dir)
        return y.ravel()

and this is now passing the dottest

Cop = CausalIntegration(4, sampling=1, trapezoidal=True, removefirst=True)
dottest(Cop)

After this I focused on your reimplementation of the original algorithm using the new fixed integrator. I checked that we get the exact same matrices of the original code (by calling todense on the operators... this is a good check to do all the time when starting from very small examples) and there everything works fine - see https://github.com/PyLops/pylops_notebooks/blob/master/developement/DerivativeInversion.ipynb.

It was however to my surprise that the final results were very different. I then started to worry that for this algorithm to work you cannot accept inaccurate solutions of the Hs=g system, which is what you inevitably get when working with operators and iterative solvers (note that \ in pylops overloads an iterative solver, eg lsqr). So to check I tried to densify the operator and use the same direct solver scipy.linalg.solve and then again things seem to work fine.... I want to play a bit more as I can't believe that small errors can throw the entire algorithm off, but so far this is what I see :)

Finally, I haven't tried Split-Bregman again but since you used an operator with wrong adjoint I suggest to discard anything you saw... for the reason I explain, this is not to be trusted as no solver has guarantees if the forward-adjoint pair is incorrect...

mrava87 commented 3 years ago

PS: you say The iteration fixed point implemented in the repository I linked uses central derivatives

Assuming this is the code you refer to:

"""Make differentiation matrix with central differences. NOTE: not efficient!
        Returns:
            np.array: N x N+1
        """
        arr = np.zeros((self.n,self.n+1))
        for i in range(0,self.n):
            arr[i,i] = -1.0
            arr[i,i+1] = 1.0
        return arr / self.dx

Indeed they say they use central difference but this is not what they implement. What they implement is a first-order forward derivative - see https://en.wikipedia.org/wiki/Finite_difference_coefficient

kakila commented 3 years ago

Thanks! Indeed this was my suspicion, the adjoint was wrongly implemented. Thanks!

And you are also right about the derivative operator in the original code, it is not central.

I wasn't aware of the dot test, thanks, now that the adjoint forward op. are right I will try again splitBergman. And compare with my reimplantation of the fixed point with central difference, maybe i see the oscillations too, confirming your suspicion that it is a relation between the functional and the discretized Op, and not the solver

kakila commented 3 years ago

BTW, instead of the keyword trapezoidal in the causal integral, better use "kind" as in the derivative, this will make easier integration of the "galerkin* kind of op i want to contribute

kakila commented 3 years ago

Finally, are you aware of the python package derivative? It might be an excellent way to complement pylops https://pypi.org/project/derivative/

mrava87 commented 3 years ago

BTW, instead of the keyword trapezoidal in the causal integral, better use "kind" as in the derivative, this will make easier integration of the "galerkin* kind of op i want to contribute

Totally agree. Let me follow this route, I’ll push soon a version with kind and 3 options implemented and you can add more at any time :) - I may need to leave half for backward compatibility but I can encourage users to prefer using kind…

mrava87 commented 3 years ago

Finally, are you aware of the python package derivative? It might be an excellent way to complement pylops https://pypi.org/project/derivative/

I didn’t know about it, looks interesting. I wonder if we coudl ask the authors if they/we can add adjoints and then pylops would simply wrap them?

mrava87 commented 3 years ago

@kakila I just added kind to CausalIntegration https://github.com/PyLops/pylops/pull/251 so that as you suggested it is easy to extend beyond these 3 kinds :)

Looking forward to any contribution from you on galerkin kinds of derivative/integration.

I also looked again at the DiffTVR with PyLops operators and it seems to me that iterative solvers can be used but require very long number of iterations for the overall method to be stable. Using:

update = lsqr(HOp, -g, atol=0, btol=0, x0=update1, iter_lim=5000)[0]

gives good results but of course this is no where near direct solvers speed.

I think this is the prime example where approximated solutions and non dense matrices should not be used unless needed: and for needed I mean, if you problem is so huge that storing the dense matrices and solving a dense system as required by the DiffTVR is not feasible then you may want to accept using much cheaper (memory wise) operator equivalents at the cost of needing an iterative solver for each update... so if you stay in 1D I think you will never face this, if you move to 2D and 3D it may come the time you need it :)

Hope this helps!

PyLops / pylops