Open epapoutsellis opened 11 months ago
I forgot to mention sth related to the new Stochastic CIL Functions. For all the Variance-Reduced CIL Functions, we need an initial
argument for the gradient, e.g., $\nabla f(x_{0})$.
In the beginning of the project, we had gradient_initialisation_point
which is another name for the initial
. Basically, in GD, ISTA, FISTA the user would need to pass initial
for the Stochastic Function, e.g., SAGA(fi, selection = RandomSampling, gradient_initialisation_point = initial)
and the same initial for the algorithm, e.g., ISTA(initial = initial , f = f, g=g , ...)
.
Although we can default everything to start with 0 arrays, in some cases it is better to run an SGD
for 1-2 epochs and run a variance reduced algorithm with the SGD solution. Also, sometimes we can use as an analytic reconstruction as initial
, e.g., FBP, FDK
. So sometimes if you forget to pass the gradient_initialisation_point
in the Stochastic Function, the algorithm will start from the FBP
recon and the Stochastic Function will initialise 0s for the gradient.
To fix this I use the following
if hasattr(self.f, "warm_start"):
if self.f.warm_start:
self.f.initial = self.initial.copy()
which basically forces the initial
of the algorithm to be the initial
gradient of the Stochastic Function. There is an alternative solution, but requires a domain
attribute for the Function class. I will not discuss it here.
For the Stochastic Project, I implemented a new base class called
PGA
(Proximal Gradient Algorithm). This is a base class used for theGD
,ISTA
andFISTA
algorithms. These algorithms have some common steps which are 1) thef.gradient
(GD, ISTA, FISTA) and the 2)g.proximal
(ISTA, FISTA). The algorithmsISTA
,FISTA
are actually Proximal Gradient Algorithms, i.e., Accelerated (FISTA) or not (ISTA). The names ISTA/FISTA used more to give emphasis on the regulariser, e.g., $\ell_{1}$. Also, in the case of $g(x)=0$ we end up with a Gradient Descent or Accelerated (Nesterov) Gradient Descent.The new PGA class implements the following iteration:
$$x{k+1} = prox{\gamma{k}g}(x{k} - \gamma{k}D(x{k})\nabla f(x_{k}))$$
g
can be any proximable function ( simple form, or iterative solver up to some accuracy)f
can be any function: aCIL Function
or anApproximateGradientSumFunction
D(x_{k})
is now a (Callable) Class, i.e., Preconditioner. This changes the output of thef.gradient
in place. I have implemented some custom preconditioners used in CT (CIL) and PET(SIRF). This will be a separate issue.Since the
f.gradient
step is common for GD, ISTA, FISTA, there is a new method _gradient_step that lives in the base class. Additionally, I fixed a CIL/SIRF gradient compatibility. The SIRFObjective
andPrior
classes have agradient
method, however, this method does not haveout
in its signature.Finally, in the update method we need to
call
theStepSizeMethod
in order to find the next size $\gamma_{k}$.