gmarkall / manycore_form_compiler

MCFC is deprecated. See https://code.launchpad.net/~grm08/ffc/pyop2
https://code.launchpad.net/~grm08/ffc/pyop2
GNU General Public License v3.0
3 stars 1 forks source link

Local vector assembly #43

Closed gmarkall closed 12 years ago

gmarkall commented 12 years ago

Changes to support code generation of the local assembly when we're solving for vector fields. The changes consist of:

  1. When there is a vector basis function, this is the tensor product of a scalar basis. This needs to be put in an array at the start of the kernel. The new function _buildBasisTensors generates the code to declare this array and populate it with the correct terms from the scalar basis.
  2. Since the vector basis is a multidimensional array, support for multidimensional array subscripts has been added to the CUDA backend. When code referencing a vector basis is generated, it uses buildMultiArraySubscript instead of buildArraySubscript so that it matches with the type of the array storing the vector basis.
  3. Since vector bases have an additional index that is present in the UFL AST, we need to memorise what the indices are when we encounter an Indexed object. This is so that the indices can be referenced by the code that generates the array subscript for the vector basis. This memorisation is taken care of in the indexed and multi_index methods of ExpressionBuilder.
  4. Computation of the number of basis functions for a vector or tensor basis is added, in order to get the correct upper bounds of loops over a vector basis.

Caveats:

  1. Some of the changes cause modification to the output of op2 identity-vector test case. How do we feel about this? I imagine the rest of the functionality will need porting into op2 as well when we start using it as a backend instead of the CUDA one.
  2. The code only works for forms with arguments, not for argument derivatives - this is because the generated code was very messy for initialising the tensor product of the derivatives of the basis functions, and additionally this code had to go inside the element loop, which would have made its cost non-trivial. Once we can generate the transformation in the form kernel, this will be far easier to implement since we will need the tensor product of derivatives of the basis functions on the reference element, which will only need doing once outside the element loop and be simpler code.
kynan commented 12 years ago

Re the caveats:

  1. I see no issue modifying any of the generated OP2 kernel code. Since this isn't used in any test case so far it's considered broken anyway.
  2. This seems a very reasonable restriction for the time being. Not much use to try to make something work which we'll discard in a while anyway.