amartinhuertas / ExploringGridapHybridization.jl

0 stars 0 forks source link

Preliminary implementation of Hybridized RT FE formulation for Darcy eq. #3

Open amartinhuertas opened 3 years ago

amartinhuertas commented 3 years ago

@fverdugo @santiagobadia ... pre-meeting and in-meeting material for our next meeting (I will prepare a pre-class quiz, ;-) )

The main driver program is available here

Performance concerns

I am bit concerned about the current JIT compilation times (I did not systematically measured execution times). I have introduced some @time macro calls in the main driver source code for the hotspot lines of code. You can see the result here. The code implementing the conforming RT method for the Darcy problems takes 105 seconds to compile, while the one of the hybridized RT method 770 seconds. I would like to discuss and understand the causes and possible approaches to reduce current compilation times (if any).

Implementation approach

The weak form of the hybridized RT FE method for the Darcy problem has the following two integrals over the cells boundaries d∂K:

  1. ∫(mh*(uh⋅n))*d∂K
  2. ∫((vh⋅n)*lh)*d∂K

where mh (resp. lh) are the test (resp., trial) functions of the space of Lagrange multipliers, defined on the mesh facets, and (uh⋅n) (resp., vh⋅n) are the normal components of the trial functions (resp., test functions) of the non-conforming RT space, defined on the mesh cells. Note that the RT space is non-conforming, and thus the Piola Map is not required in order to build the pull back of the physical space RT functions (this has been confirmed in an experiment with the current code). Indeed, the pull-back for the non-conforming RT shape functions is just the composition of the reference space function and the inverse of the geometrical map, as with Lagrangian FEs.

The current implementation of the above two integral terms leverages (intensively) the Gridap BlockMap and ArrayBlock{T,N} data types. The latter is the type of the elements in the range of the former, and represents an array of blocks, all of type T. Some of these blocks might be empty.

The code ultimately expresses the integrals above as a sum of integrals over the cell boundary facets. Achieving this outcome is a multi-step process. In the sequel, I am assuming that the integrand has the form of a binary operation in the root of the operation tree (e.g, *, as above). I will refer to the left operand as to the "test" operand, and to the right operand as to the "trial" operand.

  1. see here Assuming that the test operand is posed over a D-dim domain, we build the test operand as a cell-wise array of nested arrays of blocks indexed as test[c][f][b], where c is the global cell identifier, f is the local facet identifier, and b is the block identifier within the blocked layout of the system of PDEs (e.g., velocity->1, pressure->2, lagrange-). Each entry test[c][f][b] is typically either an array of Fields (before evaluation) or a matrix of Numbers (after evaluation). For a given c, we always get a non-empty block for any possible value of f. For a given c, and f, typically there is only a single non-empty block. For example, for vh above, test[c][f][b] contains the cell shape functions restricted to the facet with local identifier f.

  2. see here Assuming that the trial operand is posed over a D-1-dim domain, we build the trial operand as a cell-wise array of nested arrays of blocks indexed as trial[c][f1][1,b][1,f2], where c is the global cell identifier, f1 and f2 are local facet identifiers, and b is the block identifier. Note that the operand has an extra level of nested blocks at the end. In this level, only one block is non-empty, in particular the one for which the value of f1 and f2 match. This extra level is required in order to have the columns of the matrix correctly laid out (otherwise we would be incorrectly summing up all the contributions from all facets altogether in the same set of columns). For example, for lh above, we build test[c][f][1,b][1,f] as an array of Fields with the transposed trial shape functions atop the facet on the boundary of cell c with local facet identifier f.

  3. see, e.g., here The rest of arrays required in order to evaluate the integral, namely, unit outward normal n, Jacobian J, and numerical integration weights w and points x, are also built as cell-wise arrays, but with a single-level array of blocks, i.e., they are indexed as n/J/w/x[c][f].

  4. After evaluating the test and trial operands on x[c][f], and performing a broadcasted operation among them, we end up with a cell-wise array of nested arrays of blocks indexed as test_op_trial[c][f][b,b][1,f].

  5. The goal of the last step is to convert test_op_trial[c][f][b,b][1,f] into test_op_trial[c][b,b] while combining the integrand evaluated at quadrature points (i.e. test_op_trial[c][f][b,b][1,f]), the Jacobian evaluated at quadrature points (J[c][f]) and the quadrature weights (w[c][f]) . (Note that integrals over the cells are implemented as cell-wise arrays of blocks indexed as [c][b,b]). In order to do so:

    1. see here For each local facet f, we build a lazy_map with IntegrationMap as map. The arguments of this lazy_map call are the restrictions of test_op_trial, J and w to local facet f. These are arrays indexed as [c][b,b][1,f], [c], and [c], resp. ~The resulting array is indexed as [c][b,b][1,f].~
    2. see here We build the lazy sum of the arrays resulting from step 1. Note that for a given c, and a non-empty [b,b] block, [c][b,b][1,f] is non-empty for all f.
    3. see here We build a lazy_map from the result of 2. with DensifyInnerMostBlockLevelMap as map. This transforms an array indexed as [c][b,b][1,f] into an array indexed as [c][b,b].

Apart from this, the implementation relies on the following types:

Todo/tothink/todiscuss

High-level API

Gridap kernel extensions

I had to extend some functions in Gridap with additional methods. The good news are that so far I did not need to modify the behaviour of any of the methods in Gridap.

These additional methods are available in this source file. Among these, I think the latter two functions can go straight away to Gridap's kernel in a PR. Agreed? (@fverdugo?)

fverdugo commented 3 years ago

Some comments from the meeting with @amartinhuertas