Generating global assembly for vector solves

gmarkall commented 12 years ago

See https://github.com/gmarkall/manycore_form_compiler/wiki/Generating-global-assembly-for-solving-for-vectors

dham commented 12 years ago

Hi Graham,

Does this assembly strategy incorporate the optimisation that you know in advance that phi_i_phi_j_dx=0 whenever phi_i and phi_j are basis functions corresponding to different dimensions of the vector? In 2d this is a 50% reduction in flops, In 3D its 66%.

Regards,

David

On 13 December 2011 17:00, Graham Markall reply@reply.github.com wrote:

See https://github.com/gmarkall/manycore_form_compiler/wiki/Generating-global-assembly-for-solving-for-vectors

Reply to this email directly or view it on GitHub: https://github.com/gmarkall/manycore_form_compiler/issues/45

Dr David Ham Applied Modelling and Computation Group Department of Earth Science and Engineering Imperial College London

http://www.imperial.ac.uk/people/david.ham

gmarkall commented 12 years ago

Hi David,

It doesn't do this. It would be relatively straightforward to write the global assembly code with this optimisation, but I've found the local assembly code generation to be tricky to write with this optimisation. The reason for this is that the UFL AST seems to contain the assumption that all of the basis functions might bo completely different, rather than some being different dimensions of the same vector.

This global assembly strategy will work with the local assembly strategy that we already have implemented.

Graham.

gmarkall commented 12 years ago

I think I see your point a little clearer now - even though we have a block of zeroes in the local matrix, we needn't necessarily add them in to the global matrix.

I have looked at the implementation of the quadrature code generator in ffc with optimisations turned on, and I notice that they still store all the entries of the local matrix in tabulate_tensor (e.g. a 6x6 matrix for vectors on triangles) but they don't perform computations that would assemble anything into the off-diagonal blocks. I think this is because they are mainly concerned about optimising for FLOPs, but for us the real killer is the data storage and transfer. So I think the optimised case for us consists of:

In the local assembly, we need to generate the matrix for scalar basis functions. (e.g. a 3x3 matrix for triangles instead of 6x6).
In the global assembly, we need to add this scalar matrix in to the global matrix in the correct position for each dimension. Alternatively, we could just assemble the scalar matrix and solve for the first dimension and then the second, etc.

dham commented 12 years ago

Hi Graham,

It's essential for the global assembly as that wastes storage. OTOH one could naively do it by simply not doing the matrix insertion for entries which are exactly zero (and the relevant entries should be exactly zero). This even works for some versions of LMA since entire sub-blocks are zero. For local assembly, this is probably "only" a waste of flops so it may not be too bad.

Regards,

David

On 15 December 2011 11:39, Graham Markall reply@reply.github.com wrote:

Hi David,

It doesn't do this. It would be relatively straightforward to write the global assembly code with this optimisation, but I've found the local assembly code generation to be tricky to write with this optimisation. The reason for this is that the UFL AST seems to contain the assumption that all of the basis functions might bo completely different, rather than some being different dimensions of the same vector.

This global assembly strategy will work with the local assembly strategy that we already have implemented.

Graham.

Reply to this email directly or view it on GitHub: https://github.com/gmarkall/manycore_form_compiler/issues/45#issuecomment-3159987

Dr David Ham Applied Modelling and Computation Group Department of Earth Science and Engineering Imperial College London

http://www.imperial.ac.uk/people/david.ham

dham commented 12 years ago

On 15 December 2011 12:04, Graham Markall reply@reply.github.com wrote:

I think I see your point a little clearer now - even though we have a block of zeroes in the local matrix, we needn't necessarily add them in to the global matrix.

I have looked at the implementation of the quadrature code generator in ffc with optimisations turned on, and I notice that they still store all the entries of the local matrix in tabulate_tensor (e.g. a 6x6 matrix for vectors on triangles) but they don't perform computations that would assemble anything into the off-diagonal blocks. I think this is because they are mainly concerned about optimising for FLOPs, but for us the real killer is the data storage and transfer. So I think the optimised case for us consists of:

In the local assembly, we need to generate the matrix for scalar basis functions. (e.g. a 3x3 matrix for triangles instead of 6x6).

In the global assembly, we need to add this scalar matrix in to the global matrix in the correct position for each dimension. Alternatively, we could just assemble the scalar matrix and solve for the first dimension and then the second, etc.

This approach isn't safe. The contribution to each dimension is not usually the same so you can't just reuse the same scalar local matrix. It only happens to be the same in the case of the mass matrix.

Regards,

David

Reply to this email directly or view it on GitHub: https://github.com/gmarkall/manycore_form_compiler/issues/45#issuecomment-3160170

Dr David Ham Applied Modelling and Computation Group Department of Earth Science and Engineering Imperial College London

http://www.imperial.ac.uk/people/david.ham

gmarkall / manycore_form_compiler

Generating global assembly for vector solves #45