TomographicImaging / CIL

A versatile python framework for tomographic imaging
https://tomographicimaging.github.io/CIL/
Apache License 2.0
94 stars 41 forks source link

Add append to BlockFunction and BlockOperator #1494

Open paskino opened 1 year ago

paskino commented 1 year ago

It may be useful to be able to add a new function to a BlockFunction.

Requires append. It may be useful to override __add__ method?

Similarly for BlockOperator

MargaretDuff commented 1 year ago

Hi @paskino I had a play with adding an append function to blockdatacontainer, blockoperator and blockfunction. The code can be found here: https://github.com/MargaretDuff/CIL-margaret/tree/append-to-block and the testing file here: https://github.com/MargaretDuff/cil_test_notebooks/blob/master/append_block_operator_object/test_append_block_operator.py.

I know they are not very pretty implementations but let me know if you think their behaviour is what we are hoping for?

I haven't tried overwriting the add function yet, might not make sense for the blockoperator which has shape MxN and thus we need to be clear which axis we are adding to.

MargaretDuff commented 1 year ago

Had a quick chat with @paskino last week. Talked about just appending to the list of containers (block data container) or list of operators (block operator) or functions (block functions). I think the question is - do we want append to output a new object or edit the existing one? Note that numpy " append does not occur in-place: a new array is allocated and filled."

My preference would be to output a new object, similar to the behaviour of numpy

jakobsj commented 1 year ago

Good question. I think the use case we have considered so far, ie using a for loop to build up a blockoperator (blockfunction), it seems more natural to me to modify the existing one, starting from an initial empty blockoperator (-function).

On Mon, 11 Sept 2023 at 14:20, Margaret Duff @.***> wrote:

Had a quick chat with @paskino https://github.com/paskino last week. Talked about just appending to the list of containers (block data container) or list of operators (block operator) or functions (block functions). I think the question is - do we want append to output a new object or edit the existing one? Note that numpy " append https://numpy.org/doc/stable/reference/generated/numpy.append.html#numpy.append does not occur in-place: a new array is allocated and filled."

My preference would be to output a new object, similar to the behaviour of numpy

— Reply to this email directly, view it on GitHub https://github.com/TomographicImaging/CIL/issues/1494#issuecomment-1713769042, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACMDSCBRBBUKIO7CFJUOR7DXZ36ZVANCNFSM6AAAAAA3LJVGVU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

MargaretDuff commented 1 year ago

@jakobsj Good point - the case I was considering was for explicit PDHG with TV when we append the TV operator to the list of projection operators. Perhaps if we modify the existing one we call it something else, to avoid confusion with numpy append?

epapoutsellis commented 1 year ago

In the case of SPDHG, we can do something like

Ai = ListOperatorsCT(bi).get() # bi is a list of DataContainers
fi = [L2NormSquared(b=bi) for bi in bi]

Now, if we want implicit-SPDHG, we can do

f = BlockFunction(*fi)
K = BlockOperator(*Ai) 

and if we want explicit-SPDHG, we append first and then wrap them into blocks(functions and operators)

fi.append(alpha*MixedL21Norm())
Ai.append(GradientOperator(ig))

The same applies when we want to use (proximal) gradient algorithms with Stochastic Estimators. For instance if we have $$min{x} \sum{i}^{n} f_{i}(x) + g(x)$$

In the case of SGD=GD + SGFunction, we can do

fi = [LeastSquares(A=Ai, b=bi) + (1/n)*g for Ai,bi in zip(Ai, bi)]
f = SGFunction(fi)

and in the case of ProxSGD = PGA+SGFunction, we can do

fi = [LeastSquares(A=Ai, b=bi)  for Ai,bi in zip(Ai, bi)]
f = SGFunction(fi)

I think, most important is to provide a method/function for the user that will generate multiple instances of tomography operators. For CIL, this can be done in one line but for SIRF is more complicated. The appending and the wrapping is what we do in the MCIR paper and the CIL2

MargaretDuff commented 12 months ago

Thanks @epapoutsellis

In the case of SPDHG, we can do something like

Ai = ListOperatorsCT(bi).get() # bi is a list of DataContainers
fi = [L2NormSquared(b=bi) for bi in bi]

Yes - I had a play with something similar to this in issues #1501. I think from a blockgeometry we can already extract a projection operator from partitioned data.

Now, if we want implicit-SPDHG, we can do

f = BlockFunction(*fi)
K = BlockOperator(*Ai) 

and if we want explicit-SPDHG, we append first and then wrap them into blocks(functions and operators)

fi.append(alpha*MixedL21Norm())
Ai.append(GradientOperator(ig))

Ah ok, in this issue we discuss appending two blocks (functions or operators) which under the hood append the underlying lists.

I think, most important is to provide a method/function for the user that will generate multiple instances of tomography operators. For CIL, this can be done in one line but for SIRF is more complicated. The appending and the wrapping is what we do in the MCIR paper and the CIL2

I am not sure I completely understand this - could you explain more?

MargaretDuff commented 12 months ago

I have changed this so it now appends to the original object rather than producing a new object. For example: image where both the image and the norm output zeros and the shapes agree.

Similarly blockoperator and blockfunction can be appended to: image

epapoutsellis commented 12 months ago

The pipeline for the stochastic framework is the following:

1) Start with some acquisition data. 2) Split the data based on a splitting method. It can be a list of DataContainers or a BlockDataContainer. 3) Create list of functions and list of operators based on the splitted data. 4) Based on the choice of the algorithm, we wrap the lists above using some CIL API and give some mathematical meaning to these lists.

For the above steps, we need to consider the differences between CIL and SIRF as well as composite and non-composite optimisation. I have written some code blocks in the overleaf that I share for all the cases and CIL, SIRF.

CIL

For the case of SPDHG: Given the splitted data, we can create a list of ProjectionOperators or if you wish a BlockOperator

Ais  = [ProjectionOperator(ig, data.geometry, device="cpu") for data in splitted_data]
# A = BlockOperator(*Ais)  
# Splitted data can be a list of a DataContainer (they are equivalent)

Now, the user will need to choose how to form the optimisation problem for the SPDHG algorithm.

In the case of explicit SPDHG:

Grad = GradientOperator(ig)
Ais.append(Grad)

Now, we need to fix our functions,

fis = [0.5*L2NormSquared(b=data) for data in splitted_data]

Since we are in explicit case, we need one more function for the regulariser.

fis.append(alpha*MixedL21Norm())

The above list of functions does not have any mathematical meaning, but if wrap them with a BlockFunction it becomes a Separable Sum which needs a BlockOperator to form the standard $f(Kx)$ term. For the $G(x)$ we use an IndicatorBox and we are ready to run SPDHG.

In the case of implicit SPDHG: We run the steps above but without the appending steps for Ais and fis and replace $G(x)=FGP_TV$.

For non composite optimization, i.e., $f(x) + g(x)$ where the operator is not exposed, we give to list of functions a different meaning. Now, we use (Proximal) Gradient Algorithms, e.g., GD, ISTA, FISTA and for the above problem we have

fis = [LeastSquares(A = A, b=data, c=0.5) for A, data in zip(Ais, splitted_data)]

Then, for Proximal-SGD:= ISTA + SGFunction, we do

f = SGFunction(fis)

with g(x)=FGP_TV.

But, we can also solve a "similar" problem using a smooth version of the TotalVariation and run SGD (not proximal).

$$min{x} \sum{i=1}^{n}||A{i}x - b{i}||^{2} + \alpha ||\nabla x||_{\epsilon}$$

Grad = GradientOperator(ig)
reg = OperatorCompositionFunction((alpha/num_subsets)*SmoothMixedL21Norm(1e-5), Grad)
fi = [OperatorCompositionFunction(L2NormSquared(b=data), Ai) + reg for data, Ai, in zip(data_split1,Ai)]

sg_func = SGFunction(fi, selection)
sg_func.dask=True
step_size = 1.65e-06
sgd = GD(initial = initial, objective_function = sg_func,  step_size = step_size,
            max_iteration = epochs * num_subsets, 
            update_objective_interval =  num_subsets) 
sgd.run(verbose=1, callback=[cb])

If you replace the lists of operators and data above with BlockOperators and BlockDataContainers the code will continue to work. However, for the syntax here, I do not know the type of

F = L2NormSquared(b=data_p)

Is it a list of L2NormSquared instances or a BlockFunction or a SumFunction or a ApproximateGradientSumFunction.

If we assume that is a BlockFunction then we imply that we want to solve a composite opt problem and are good to go to run SPDHG, explicit and implicit depending on the appending. However, there is a fundamental problem with the CIL functions API. Our functions do not have domain. Our base loss functions act on DataContainers not on BlockDataContainers. In the above syntax, we imply that F takes values from a product of ImageGeometries, i.e., a Cartesian Product which is a BlockGeometry.

Now, if I do the same for the LeastSquares

F = LeastSquares(A=BlockOperator, b=BlockDataContainer)

and assume that is a BlockFunction then we cannot use it for Proximal Gradient type of Algorithms. We need to do

F = SGFunction(*F.functions)

which means that we go from a BlockFunction(Separable Sum) to a SumFunction(Finite Sum). There are other issues about properties for each functions such as smoothness, Lipschitz constants which I will not mention here.

SIRF

In the case of SIRF, we need to split the actual counts, background and attenuation factors, maybe more @KrisThielemans knows more. However, SIRF does not have a .geometry attribute, so we cannot do one line of code to generate a list of operators based on the list of data. So I created the following class:

class ConstructListOperatorsBase(ABC):
    def __init__(self, list_data):
        self.list_data = list_data        
        self.len_list = len(self.list_data)
        self.operators = []

    @abstractmethod
    def get(self):
        pass

class ListOperatorsPET(ConstructListOperatorsBase):

    def __init__(self, list_data, list_attenuation_factors, image):
        super(ListOperatorsPET, self).__init__(list_data)
        self.list_attenuation_factors = list_attenuation_factors
        self.image = image

    def get(self):
        for i in range(self.len_list):
                acq_model = pet.AcquisitionModelUsingRayTracingMatrix()
                acq_model.set_num_tangential_LORs(10)
                asm_attn = pet.AcquisitionSensitivityModel(self.list_attenuation_factors[i]) 
                acq_model.set_acquisition_sensitivity(asm_attn)
                acq_model.set_up(self.list_data[i],self.image) 
                self.operators.append(acq_model)
        return self.operators  

So the user will run Ai = ListOperatorsPET(bi, attfs_i, image).get() and get a list of operators. Now, depending on which algorithm we choose to solve a PET recon problem (with/without a regulariser) we can do

fi = [OperatorCompositionFunction(KullbackLeibler(b=bi, eta = bck_i), Ai) 
                                 for bi,bck_i,Ai in zip(bi, bck_i, Ai)]

for non-composite opt and

fi = [KullbackLeibler(b=bi, eta = bck_i) for bi, bck_i in zip(bi, bck_i)] # bi is a list of splitted data, bck_i is a list of splitted background
f = BlockFunction(*fi)
K = BlockOperator(*Ai)

for composite opt.

epapoutsellis commented 12 months ago

At least for the Stochastic Framework API, we should give a mathematical meaning to the list of functions only at the last step depending on the choice of algorithm and optimization problem. Note that there are algorithms which combine the two cases mentioned above. For instance, the PD3O algorithm has a smooth term, a composite term and proximable term.

paskino commented 11 months ago

BlockOperators may be a little tricky as they can be put in a matrix shape rather than just a row or column as it happens with BlockDataContainer and BlockFunction.

The use case I had in mind was to exactly simplify the setup of SPDHG.

Projection operator automatically returns a BlockOperator if passed a BlockGeometry. Notice this only works for the ASTRA and TIGRE plugins.


# projection operator automatically returns a BlockOperator
A = ProjectionOperator(ig, ag_blockgeometry)

fis = [0.5*L2NormSquared(b=data) for data in splitted_data]
F = BlockFunction(*fis)

My aim was to remove the list comprehension and have the following loop. It's not shorter, but IMO is clearer.

F = BlockFunction()
for data in splitted_data:
    F.append( 0.5 * L2NormSquared(b=data) )

Or do similarly to the ProjectionOperator and L2NormSquared should return a BlockFunction if it recognises that it should, i.e. it is passed b as a BlockDataContainer:


F = 0.5 * L2NormSquared(b=splitted_data)
epapoutsellis commented 11 months ago

If the BlockOperator has a matrix shape, e.g., TGV regulariser, instead of passing a ProjectionOperator you can pass a BlockOperator(list_of_projection_operators). Now the final blockOperator will look sth like

operator = BlockOperator(GradOp, -IdentityOp, 
                                            ZeroOp, SymmetrizedGradOp, 
                                            Block_of_Projection_operator, ZeroOp, shape=(3,2))

in the case of explicit PDHG.

This A = ProjectionOperator(ig, ag_blockgeometry) is very strange. I understand that it returns a BlockOperator by appending list of operators, but it hides from the user the A_i's in order to form

$$\minx \sum{i=1}^n F{i}(A{i}x) + G(x)$$

Also for standard PDHG( explicit with TV), we have $$\minx \sum{i=1}^2 F{i}(A{i}x) + G(x)$$

and we do

A = BlockOperator(ProjectionOperator, Grad)
F = BlockFunction(L2NormSquared, MixedL21Norm)

It may seem similar to the ChannelwiseOperator when num_channels>1, but in this case you map an image space to a splitted sinogram space...

IMO, I think list comprehension is quite clear and basic in python, where introducing a new syntax will confuse the users a lot... well everything depends on the documentation and examples.

In terms of the list of functions, one alternative is to introduce a domain attribute to our functions, None by default. and BlockFunction to inherit from SumFunction. But it is not be very SIRF friendly.

So if you do F = L2NormSquared(domain=None or data.geometry, b=data) we have the standard $$F(x) := ||x-b||^{2}$$.

If F = L2NormSquared(domain=splitted_data_block_geometry, b=splitted_data) we have a separable sum aka BlockFunction

$$F(x{1}, x{2}, ..., x{n}) := \sum{i=1}^{n} f{i}(x{i}) = \sum{i=1}^{n} ||x{i}-b_{i}||^{2}$$.

and we can check/force to be a BlockFunction because every L2NormSquared function has a proximal method implemented.

If F = LeastSquares(A=BlockOperator(Ais), b=splitted_data) for non-composite opt, we have a SumFunction and we can check/force to be a SumFunction because every LeastSquares function does not have proximal method implemented.

$$F(x) := \sum{i=1}^{n} f{i}(x) = \sum{i=1}^{n} ||A{i}x-b_{i}||^{2}$$.

and to make it a Stochastic Function, e.g., SGFunction, we do SGFunction(F.functions).

This will probably work for all functions, not only L2normSquared.