CIL algorithms with SIRF Objective and Priors

epapoutsellis commented 5 months ago

At the moment, we cannot use CIL algorithms with SIRF Objectives and Priors (smooth optimisation). The reason is that there are no out arguments in ObjectiveFunction and Prior

File ~/miniconda3/envs/cil_sirf_pproj/lib/python3.10/site-packages/cil/optimisation/functions/Function.py:306, in SumFunction.gradient(self, x, out)
    304 for i,f in enumerate(self.functions):
    305     if i == 0:
--> 306         f.gradient(x, out=out)
    307     else:
    308         out += f.gradient(x)

TypeError: ObjectiveFunction.gradient() got an unexpected keyword argument 'out'

If we do sth like

# STIR.ObjectiveFunction??
from sirf.Utilities import check_status, assert_validity
import sirf.pystir as pystir

def gradient2(self, image, out=None):
    """Returns gradient of the prior.

    Returns the value of the gradient of the prior for the specified image.
    image: ImageData object
    """
    assert_validity(image, STIR.ImageData)
    grad = STIR.ImageData()
    grad.handle = pystir.cSTIR_priorGradient(self.handle, image.handle)
    check_status(grad.handle)
    if out is None:
        return grad
    else:
        out.fill(grad)

def gradient1(self, image, subset=-1, out=None):
    """Returns the value of the additive component of the gradient

    of this objective function on the specified image corresponding to the
    specified subset (see set_num_subsets() method).
    If no subset is specified, returns the full gradient, i.e. the sum of
    the subset components.
    image: ImageData object
    subset: Python integer scalar
    """
    assert_validity(image, STIR.ImageData)
    grad = STIR.ImageData()
    grad.handle = pystir.cSTIR_objectiveFunctionGradient(
        self.handle, image.handle, subset)
    check_status(grad.handle)
    if out is None:
        return grad
    else:
        out.fill(grad)

STIR.ObjectiveFunction.gradient = gradient1
STIR.Prior.get_gradient = gradient2

then we are ok.

Now, there are options for PET/SPECT applications using CIL + SIRF:

Option 1: CIL algorithms with SIRF Objectives and Priors. This combination is for maximizing the -LogLikelihood. Therefore a positive step size in CIL algorithms (gradient descent) will not work. We need a negative step size.
```
f = SVRGFunction(obj_funs) 
g = IndicatorBox(lower=0)
svrg = ISTA(initial = OSEM_image, f=f, g=g, update_objective_interval=1, max_iteration=num_subsets*50, step_size = -0.0000005)
svrg.run(10, verbose=1) #, callback=[cb1])
```

     Iter   Max Iter  Time(s)/Iter            Objective
        0        350         0.000         -2.88219e+06
        1        350        11.651         -2.87197e+06
        2        350         8.126         -2.87688e+06
        3        350         6.900         -2.88335e+06
        4        350         6.052         -2.87318e+06
        5        350         5.749         -2.87096e+06
        6        350         5.470         -2.87301e+06

Option 2: CIL algorithms with CIL Objectives and SIRF Operators/Priors: In this case, instead of using the PoissonLogLikelihoodWithLinearModelForMean, we build one using KullbackLeibler with SIRF Operators and SIRF random events. The problem in this case is that the objective values of SIRF Objective and CIL Objective are different.
- $$KL_{CIL}(b, Au) = \int b\log(b) - b\log(Au+\eta) - b + Au + \eta$$
- $$KL_{SIRF}(b, Au) = - \int Au + \eta - g\log(Au+\eta) $$

f = SVRGFunction(fi) # fi are CIL KullbackLeibler including prior
g = IndicatorBox(lower=0)
svrg = ISTA(initial = OSEM_image, f=f, g=g, update_objective_interval=1, max_iteration=num_subsets*50, step_size = 0.0000005) # Positive step size
svrg.run(5,verbose=1, callback=[cb1])

     Iter   Max Iter  Time(s)/Iter            Objective     Similar to SIRF Objective 
        0        350         0.000          6.26882e+06        -2.88272e+06
        1        350        13.762          5.36511e+04        -2.88318e+06
        2        350         8.803          5.46351e+04        -2.88417e+06
        3        350         7.049          5.38500e+04        -2.88338e+06
        4        350         6.436          5.46300e+04        -2.88416e+06
        5        350         6.018          5.43486e+04        -2.88388e+06

KrisThielemans commented 5 months ago

I think @casperdcl will want a signature

    if out is not None:
        out.fill(grad)
    return grad

Don't we have a problem with the subset signature for CIL? gradient1(self, image, subset=-1, out=None). Maybe not as they're named arguments.

These can then easily be added to SIRF. @evgueni-ovtchinnikov, could you do that?

KrisThielemans commented 5 months ago

Option 1: CIL algorithms with SIRF Objectives and Priors. This combination is for maximizing the -LogLikelihood. Therefore a positive step size in CIL algorithms (gradient descent) will not work. We need a negative step size.

I really don't like this. It works for some algorithms, but anything that uses line-searches, backtracking etc will get very confused.

Option 2: CIL algorithms with CIL Objectives and SIRF Operators/Priors: In this case, instead of using the PoissonLogLikelihoodWithLinearModelForMean, we build one using KullbackLeibler with SIRF Operators and SIRF random events. The problem in this case is that the objective values of SIRF Objective and CIL Objective are different.

This is a constant offset. Why does it matter?

However, I predict that KL will be slower than the SIRF operator.

Option 3: wrap with a minus sign.

obj = cil.optimisation.functions.ScaledFunction(SIRFobj, -1)

I prefer to recommend the 3rd option. Obviously, it will be slower as well, as internally there will be an extra multiplication of the gradient (unless CIL is smart and first multiplies steps wtih any scalars, and only then the actual container), but it is entirely generic.

paskino commented 5 months ago

Resuming a conversation with @epapoutsellis @KrisThielemans @evgueni-ovtchinnikov @casperdcl :

Add out parameter to the signature of gradient of SIRF functions and priors. The method should return even if out is passed:
```
if out is not None:
    out.fill(grad)
return grad
```
PoissonLogLikelihoodWithLinearModelForMean vs KullbackLeibler. These differ by a constant offset which shouldn't impact the minimisation.
Using PoissonLogLikelihoodWithLinearModelForMean with SIRF object should be faster.
KullbackLeibler should be used with some algorithms such as PDHG, which requires methods convex_conjugate etc that are missing in SIRF objective functions.
Whenever SIRF Objective functions and priors are used in CIL algorithms, they should be wrapped by a CIL ScaledFunction with a scalar = -1

SyneRBI / SIRF

CIL algorithms with SIRF Objective and Priors #1242