Proximal Gradient Algorithm class

For the Stochastic Project, I implemented a new base class called PGA (Proximal Gradient Algorithm). This is a base class used for the GD, ISTA and FISTA algorithms. These algorithms have some common steps which are 1) the f.gradient (GD, ISTA, FISTA) and the 2) g.proximal (ISTA, FISTA). The algorithms ISTA, FISTA are actually Proximal Gradient Algorithms, i.e., Accelerated (FISTA) or not (ISTA). The names ISTA/FISTA used more to give emphasis on the regulariser, e.g., $\ell_{1}$. Also, in the case of $g(x)=0$ we end up with a Gradient Descent or Accelerated (Nesterov) Gradient Descent.

The new PGA class implements the following iteration:

$$x{k+1} = prox{\gamma{k}g}(x{k} - \gamma{k}D(x{k})\nabla f(x_{k}))$$

g can be any proximable function ( simple form, or iterative solver up to some accuracy)
f can be any function: a CIL Function or an ApproximateGradientSumFunction
$\gamma_{k}$ is now a (Callable) Class StepSizeMethod that is responsible to change (or not) the step size based on some rules. This will be a separate issue.
D(x_{k}) is now a (Callable) Class, i.e., Preconditioner. This changes the output of the f.gradient in place. I have implemented some custom preconditioners used in CT (CIL) and PET(SIRF). This will be a separate issue.

Since the f.gradient step is common for GD, ISTA, FISTA, there is a new method _gradient_step that lives in the base class. Additionally, I fixed a CIL/SIRF gradient compatibility. The SIRF Objective and Prior classes have a gradient method, however, this method does not have out in its signature.

Finally, in the update method we need to call the StepSizeMethod in order to find the next size $\gamma_{k}$.

    def update(self):

        r"""Performs a single iteration of ISTA

        .. math:: x_{k+1} = \mathrm{prox}_{\alpha g}(x_{k} - \alpha\nabla f(x_{k}))

        """

        self._gradient_step(self.x_old, out=self.x)
        # update step size
        step_size =  self.step_size(self)
        self.x_old.sapyb(1., self.x, -step_size, out=self.x_old)
        self.g.proximal(self.x_old, step_size, out=self.x)

I forgot to mention sth related to the new Stochastic CIL Functions. For all the Variance-Reduced CIL Functions, we need an initial argument for the gradient, e.g., $\nabla f(x_{0})$.

In the beginning of the project, we had gradient_initialisation_point which is another name for the initial. Basically, in GD, ISTA, FISTA the user would need to pass initial for the Stochastic Function, e.g., SAGA(fi, selection = RandomSampling, gradient_initialisation_point = initial) and the same initial for the algorithm, e.g., ISTA(initial = initial , f = f, g=g , ...).

Although we can default everything to start with 0 arrays, in some cases it is better to run an SGD for 1-2 epochs and run a variance reduced algorithm with the SGD solution. Also, sometimes we can use as an analytic reconstruction as initial, e.g., FBP, FDK. So sometimes if you forget to pass the gradient_initialisation_point in the Stochastic Function, the algorithm will start from the FBP recon and the Stochastic Function will initialise 0s for the gradient.

To fix this I use the following

        if hasattr(self.f, "warm_start"):
            if self.f.warm_start:
                self.f.initial = self.initial.copy()

which basically forces the initial of the algorithm to be the initial gradient of the Stochastic Function. There is an alternative solution, but requires a domain attribute for the Function class. I will not discuss it here.

TomographicImaging / CIL

Proximal Gradient Algorithm class #1586