FEniCS / ffcx

Next generation FEniCS Form Compiler for finite element forms
https://fenicsproject.org
Other
145 stars 38 forks source link

Remove generated kernel duplication #589

Closed jorgensd closed 1 year ago

jorgensd commented 1 year ago
from ufl import FiniteElement, TrialFunction, TestFunction, dx, Mesh, VectorElement, triangle, FunctionSpace, inner, grad
import ufl.algorithms
import ufl.classes
mesh = Mesh(VectorElement("Lagrange", triangle, 1))
V = FunctionSpace(mesh, FiniteElement("Lagrange", triangle, 1))
u = TrialFunction(V)
v = TestFunction(V)
t = tuple([i for i in range(1000)])
a = inner(u, v)*dx(t) + inner(grad(u), grad(v))*dx(2) + inner(u, v)*dx

generates a c file with

  // Quadrature rules
  static const double weights_48e[3] = { 0.1666666666666667, 0.1666666666666667, 0.1666666666666667 };
  // Precomputed values of basis functions and precomputations
  // FE* dimensions: [permutation][entities][points][dofs]
  static const double FE3_C0_Q48e[1][1][3][3] =
    { { { { 0.6666666666666667, 0.1666666666666666, 0.1666666666666667 },
          { 0.1666666666666667, 0.1666666666666666, 0.6666666666666665 },
          { 0.1666666666666668, 0.6666666666666665, 0.1666666666666667 } } } };
  static const double FE4_C0_D10_Q48e[1][1][1][3] = { { { { -1.0, 1.0, 0.0 } } } };
  static const double FE4_C1_D01_Q48e[1][1][1][3] = { { { { -1.0, 0.0, 1.0 } } } };
  // Quadrature loop independent computations for quadrature rule 48e
  double J_c0 = 0.0;
  double J_c3 = 0.0;
  double J_c1 = 0.0;
  double J_c2 = 0.0;
  for (int ic = 0; ic < 3; ++ic)
  {
    J_c0 += coordinate_dofs[ic * 3] * FE4_C0_D10_Q48e[0][0][0][ic];
    J_c3 += coordinate_dofs[ic * 3 + 1] * FE4_C1_D01_Q48e[0][0][0][ic];
    J_c1 += coordinate_dofs[ic * 3] * FE4_C1_D01_Q48e[0][0][0][ic];
    J_c2 += coordinate_dofs[ic * 3 + 1] * FE4_C0_D10_Q48e[0][0][0][ic];
  }
  double sp_48e[4];
  sp_48e[0] = J_c0 * J_c3;
  sp_48e[1] = J_c1 * J_c2;
  sp_48e[2] = sp_48e[0] + -1 * sp_48e[1];
  sp_48e[3] = fabs(sp_48e[2]);
  for (int iq = 0; iq < 3; ++iq)
  {
    const double fw0 = sp_48e[3] * weights_48e[iq];
    double t0[3];
    for (int i = 0; i < 3; ++i)
      t0[i] = fw0 * FE3_C0_Q48e[0][0][iq][i];
    for (int i = 0; i < 3; ++i)
      for (int j = 0; j < 3; ++j)
        A[3 * i + j] += FE3_C0_Q48e[0][0][iq][j] * t0[i];
  }

repeated 1000 times. This makes the code generation slow and not scalable with an increasing number of subdomains. Running ffcx with a 1000 subdomains takes about 3.8 seconds, while 10000 takes ~43 seconds.

This PR depends on: https://github.com/FEniCS/ufl/pull/92 and resolves https://github.com/FEniCS/ffcx/issues/447

The current PR with 1000 subdomains runs in 0.9 seconds and 10000 subdomains runs in 8 seconds. The c-file in this branch is 570 lines with 10000 subdomains, while it is over 600 000 lines on the main branch

coveralls commented 1 year ago

Coverage Status

coverage: 79.883% (+0.03%) from 79.855% when pulling a199c5f9ac62b090f5b52d4eb2d695c2036d155d on dokken/group_integrals_v2 into 019bdc7439b6817af2990c146fe7be12df66798f on main.

jorgensd commented 1 year ago

Note that there is now an implicit dependency in the generation of the offsets. The list in Line 102 of forms.py is dependent on the dolfinx::fem::IntegralType-ordering. This means that if we change the IntegralType (either value for an entry or add more, the generation of offsets will be wrong). This is exposed in: https://github.com/FEniCS/ffcx/pull/589/commits/2fda42c4cc276b39251fa52639cb3ae1b7c6678b and was introduced in DOLFINx in https://github.com/FEniCS/dolfinx/pull/2744/files

chrisrichardson commented 1 year ago

I think the ordering is in the ufcx enum, not in dolfinx. I agree this is a weakness which should be addressed, though I am not quite sure how. One idea would be to provide a function (fixed code not generated). Is it worth it? I feel it is duplication - two ways of doing the same thing...

jorgensd commented 1 year ago

I think the ordering is in the ufcx enum, not in dolfinx. I agree this is a weakness which should be addressed,

As it is used in DOLFINx,

I think the ordering is in the ufcx enum, not in dolfinx. I agree this is a weakness which should be addressed, though I am not quite sure how. One idea would be to provide a function (fixed code not generated). Is it worth it? I feel it is duplication - two ways of doing the same thing...

I see, it uses the ufcx.h file to get exterior_facet etc in one place in dolfinx/fem/utils.h (all other places in the file uses IntegralType::..., so that threw me off. I think the way to fix this is instead of looping over strings when building the offset, it should base this on the ufch.x header enum, or smth.

jorgensd commented 1 year ago

@chrisrichardson could we go along at get this merged (and the PR in ufl), as it is getting quite tedious to keep up to date with rewrites of ffcx, and I haven't seen anyone having any objections regarding this PR

mscroggs commented 1 year ago

Now that https://github.com/FEniCS/ufl/pull/92 is merged, I think we need to merge this asap