Add preconditioning in GD

Currently we have the following for the GD algorithm:

$$ x{k+1} = x{k} - \gamma{k} \nabla f (x{k}) $$

However, we need to have a preconditioning before the f.gradient step as discussed in #1345. This is both useful in the deterministic and stochastic step.

$$ x{k+1} = x{k} - \gamma{k} D(x{k})\nabla f (x_{k}) $$

Notice that we already have this functionality in the SIRT algorithm but it is hardcoded. In fact in SIRT, we have two fixed preconds, which are

$$ \frac{1}{A^{T}\mathbb{1}} $$

and

$$ \frac{1}{A \mathbb{1}} $$

Of course SIRT is designed for a specific optimisation problem where we know that the objective function is

$$ ||Ax - d||^{2} $$

and our users will only need to pass the operator A and the data d. However, if GD , ISTA and FISTA allow preconditioning we can have an algorithm similar to SIRT

without non-negativity constraint = GD
with non-negativity constraint = ISTA/FISTA

And also, we can have their stochastic versions, i.e., OS-SIRT etc. Actually, since SIRT does not accept an objective function, we cannot setup OS-SIRT with the SIRT class via SubsetSumFunction.

As described in #1345, we can have the following that can be used in both deterministic and stochastic cases:

a) preconditioner is fixed $\frac{1}{A^{T}\mathbb{1}}$ b) depends on the iterates $\frac{x_{k}+\delta}{A^{T}\mathbb{1}}$ c) depends on the subsets_num, i.e., activate/deactivate it after some epochs or iterations. d) None

To cover all of the above cases, I suggest to have sth like

GD-update

    def update(self):
        '''Single iteration'''

        self.objective_function.gradient(self.x, out=self.x_update)
        if self.precond is True:
           self.x_update *= self.precond( x iterate, num_subset, iteration)  
        if self.update_step_size:
            # the next update and solution are calculated within the armijo_rule
            self.step_size = self.armijo_rule()
        else:
            self.x.sapyb(1.0, self.x_update, -self.step_size, out=self.x)

And the user can predefine a callable function for the precond step:

def fixed_precond(x, num_subset, iteration):
         return 1./A.adjoint(ag.allocate(1.0)) # sometimes we need to check if there are no 0's in the denominator

def fixed_precond(x, num_subset, iteration):
         return (x + delta)/(A.adjoint(ag.allocate(1.0))

or in the stochastic case

def fixed_precond(x, num_subset, iteration):
         if epochs<5:
               return (x + delta)/(A.adjoint(ag.allocate(1.0))
        else:
              self.precond = False # deactivate precond after 5 epochs

What do you think @paskino @gfardell @jakobsj

After the discussion we had, I want to write down my proposal based on the following assumptions:

The preconditioner does not belong to the Function nor to the Algorithm classes
The preconditioner is used by the algorithm and needs to access properties that might not be known to the algorithm
The algorithm applying preconditioning can pass information that is available to it, and may be dependent on the current iteration, such as iteration number and current solution. These seem to be the only required parameters the algorithm can contribute for the calculation of the preconditioner.

Given this, I propose to create a new class Preconditioner with a method called, for the sake of the argument, apply with the following signature


class Preconditioner(object):

    def apply(self, x, iteration_num, x_update):
        '''this will modify the x_update internally and not return'''        
        pass

As described above by @epapoutsellis, in the GD-update section, GD is calling the function (in that case) precond. I propose that GD gets passed an instance of the Preconditioner appropriately defined, and use it in the update method, as sketched below:


def update(self):
        '''Single iteration'''

        self.objective_function.gradient(self.x, out=self.x_update)
        if self.precond is True:
           self.preconditioner.apply(self.x, iteration, self.x_update)  
        if self.update_step_size:
            # the next update and solution are calculated within the armijo_rule
            self.step_size = self.armijo_rule()
        else:
            self.x.sapyb(1.0, self.x_update, -self.step_size, out=self.x)

I also suggest to create preconditioners by means of a factory, which could be the same Preconditioner class. Something as follows could be used:


class Preconditioner(object):

    @staticmethod
    def create_fixed(A):
        precond = Preconditioner()
        precond.A = A
        precond.ag = A.range_geometry()
        precond.apply = self.apply_fixed
        return precond

    def apply_fixed(self, x, iteration_num):
        return 1./self.A.adjoint(self.ag.allocate(1.0))

    @staticmethod
    def create_variable(A, delta):
        precond = Preconditioner()
        precond.A = A
        precond.ag = A.range_geometry()
        precond.delta = delta
        precond.apply = self.apply_variable
        return precond

    def apply_variable(self, x, iteration_num):
        return (x + self.delta)/(self.A.adjoint(self.ag.allocate(1.0))

    @staticmethod
    def create_fixed_stochastic(A, num_subsets, max_epoch):
        precond = Preconditioner()
        precond.A = A
        precond.ag = A.range_geometry()
        precond.num_subsets = num_subsets
        precond.max_epoch = max_epoch
        precond.apply = self.apply_fixed_stochastic
        return precond

    def apply_fixed_stochastic(self, x, iteration_num):
        if iteration_num * self.num_subsets<self.max_epoch:
               return (x + self.delta)/(self.A.adjoint(self.ag.allocate(1.0))
        else:
              return 1.

    def apply(self, x, iteration_num):
        pass

This will cover all the above mentioned cases. The user would use it as below, provided we define set_preconditioner in GD:


f = LeastSquares(A,b,0.5)
algo = GD(initial=A.domain_geometry().allocate(0.), objective_function=f, step_size=f.L/3, max_iteration=100)

precond = Preconditioner.get_fixed(A)
# or
precond = Preconditioner.get_variable(A, delta)
# or in case f is a stochastic function
f = SGDFunction(...)
precond = Preconditioner.get_fixed_stochastic(A, f.num_functions)

algo.set_preconditioner(precond)

algo.run(100)

Additionally a user might create an appropriate Preconditioner to pass to the algorithm provided that this object, which may or may not be an instance of Preconditioner, has an apply method with the above mentioned signature. In Java, you would say that the object passed follows a certain Interface, i.e. it provides certain methods with a defined signature.

@paskino , @KrisThielemans any decision on the above design about GD and preconditioning?

TomographicImaging / CIL

Add preconditioning in GD #1359