Closed epapoutsellis closed 3 months ago
After the discussion we had, I want to write down my proposal based on the following assumptions:
Function
nor to the Algorithm
classesGiven this, I propose to create a new class Preconditioner
with a method called, for the sake of the argument, apply
with the following signature
class Preconditioner(object):
def apply(self, x, iteration_num, x_update):
'''this will modify the x_update internally and not return'''
pass
As described above by @epapoutsellis, in the GD-update section, GD
is calling the function (in that case) precond
. I propose that GD
gets passed an instance of the Preconditioner
appropriately defined, and use it in the update method, as sketched below:
def update(self):
'''Single iteration'''
self.objective_function.gradient(self.x, out=self.x_update)
if self.precond is True:
self.preconditioner.apply(self.x, iteration, self.x_update)
if self.update_step_size:
# the next update and solution are calculated within the armijo_rule
self.step_size = self.armijo_rule()
else:
self.x.sapyb(1.0, self.x_update, -self.step_size, out=self.x)
I also suggest to create preconditioners by means of a factory, which could be the same Preconditioner
class. Something as follows could be used:
class Preconditioner(object):
@staticmethod
def create_fixed(A):
precond = Preconditioner()
precond.A = A
precond.ag = A.range_geometry()
precond.apply = self.apply_fixed
return precond
def apply_fixed(self, x, iteration_num):
return 1./self.A.adjoint(self.ag.allocate(1.0))
@staticmethod
def create_variable(A, delta):
precond = Preconditioner()
precond.A = A
precond.ag = A.range_geometry()
precond.delta = delta
precond.apply = self.apply_variable
return precond
def apply_variable(self, x, iteration_num):
return (x + self.delta)/(self.A.adjoint(self.ag.allocate(1.0))
@staticmethod
def create_fixed_stochastic(A, num_subsets, max_epoch):
precond = Preconditioner()
precond.A = A
precond.ag = A.range_geometry()
precond.num_subsets = num_subsets
precond.max_epoch = max_epoch
precond.apply = self.apply_fixed_stochastic
return precond
def apply_fixed_stochastic(self, x, iteration_num):
if iteration_num * self.num_subsets<self.max_epoch:
return (x + self.delta)/(self.A.adjoint(self.ag.allocate(1.0))
else:
return 1.
def apply(self, x, iteration_num):
pass
This will cover all the above mentioned cases. The user would use it as below, provided we define set_preconditioner
in GD
:
f = LeastSquares(A,b,0.5)
algo = GD(initial=A.domain_geometry().allocate(0.), objective_function=f, step_size=f.L/3, max_iteration=100)
precond = Preconditioner.get_fixed(A)
# or
precond = Preconditioner.get_variable(A, delta)
# or in case f is a stochastic function
f = SGDFunction(...)
precond = Preconditioner.get_fixed_stochastic(A, f.num_functions)
algo.set_preconditioner(precond)
algo.run(100)
Additionally a user might create an appropriate Preconditioner
to pass to the algorithm provided that this object, which may or may not be an instance of Preconditioner
, has an apply
method with the above mentioned signature. In Java, you would say that the object passed follows a certain Interface
, i.e. it provides certain methods with a defined signature.
@paskino , @KrisThielemans any decision on the above design about GD and preconditioning?
Currently we have the following for the
GD
algorithm:$$ x{k+1} = x{k} - \gamma{k} \nabla f (x{k}) $$
However, we need to have a preconditioning before the
f.gradient
step as discussed in #1345. This is both useful in the deterministic and stochastic step.$$ x{k+1} = x{k} - \gamma{k} D(x{k})\nabla f (x_{k}) $$
Notice that we already have this functionality in the
SIRT
algorithm but it is hardcoded. In fact inSIRT
, we have two fixed preconds, which are$$ \frac{1}{A^{T}\mathbb{1}} $$
and
$$ \frac{1}{A \mathbb{1}} $$
Of course SIRT is designed for a specific optimisation problem where we know that the objective function is
$$ ||Ax - d||^{2} $$
and our users will only need to pass the operator
A
and the datad
. However, ifGD
,ISTA
andFISTA
allow preconditioning we can have an algorithm similar toSIRT
GD
ISTA/FISTA
And also, we can have their stochastic versions, i.e., OS-SIRT etc. Actually, since
SIRT
does not accept an objective function, we cannot setupOS-SIRT
with theSIRT
class viaSubsetSumFunction
.As described in #1345, we can have the following that can be used in both deterministic and stochastic cases:
a) preconditioner is fixed $\frac{1}{A^{T}\mathbb{1}}$ b) depends on the iterates $\frac{x_{k}+\delta}{A^{T}\mathbb{1}}$ c) depends on the
subsets_num
, i.e., activate/deactivate it after some epochs or iterations. d)None
To cover all of the above cases, I suggest to have sth like
GD-update
And the user can predefine a callable function for the
precond
step:or in the stochastic case
What do you think @paskino @gfardell @jakobsj