Open wence- opened 5 years ago
Is this a refactorisation or a (loopy) code generation issue?
I didn't get to the bottom of it. The cell avg code gets scheduled in the wrong part of the loop nest. That it doesn't happen in vanilla mode suggests it's something to do with refactorisation (issue appears in coffee mode too)
It's not a (loopy-specific) code gen issue: the same problem exhibits when running in coffee mode.
After some more digging. I think the problem is that the refactorisation routines just don't like when you have terms that are actually complete integrals.
One plausible way to fix this, and I think this is a general solution which we might need for other things like embedding entity-wise interpolations inside forms, is to treat these "special" UFL->GEM translations as complete integrand expressions.
You can then apply all the technology term-wise (so the translator pass that turns cell_avg(foo) -> gem_expression also applies the selected mode's optimisation pass).
If we hang metadata on this resulting node (such that we say "always classify this expression as OTHER
so it is not refactorised), then things fall out.
I hacked this up for this particular case:
Consider
import loopy
from pyop2.datatypes import ScalarType
from tsfc import compile_form
from ufl import (FunctionSpace, Mesh, TensorProductCell, TestFunction,
TrialFunction, VectorElement, cell_avg, div, dx, inner,
interval, quadrilateral)
mesh_order = 2
quadrature_degree = 3
cell = TensorProductCell(quadrilateral, interval)
V = VectorElement("Q", cell, mesh_order)
mesh = Mesh(V)
V = FunctionSpace(mesh, VectorElement("Q", cell, mesh_order))
u = TrialFunction(V)
v = TestFunction(V)
a1 = inner(div(u), div(v)) * dx(degree=quadrature_degree)
a2 = inner(cell_avg(div(u)), div(v)) * dx(degree=quadrature_degree)
a3 = inner(cell_avg(div(u)), cell_avg(div(v))) * dx(degree=quadrature_degree)
parameters = {"mode": "spectral"}
def flops(k):
op_map = loopy.get_op_map(
k.copy(options=loopy.Options(ignore_boostable_into=True),
silenced_warnings=['insn_count_subgroups_upper_bound',
'get_x_map_guessing_subgroup_size',
'summing_if_branches_ops']),
subgroup_size='guess')
return op_map.filter_by(name=['add', 'sub', 'mul', 'div'],
dtype=[ScalarType]).eval_and_sum({})
k1, = compile_form(a1, parameters=parameters, coffee=False)
print("No cell_avg", flops(k1.ast))
k2, = compile_form(a2, parameters=parameters, coffee=False)
print("1 cell_avg ", flops(k2.ast))
k3, = compile_form(a3, parameters=parameters, coffee=False)
print("2 cell_avg ", flops(k3.ast))
Before my changes:
No cell_avg 134418
1 cell_avg 17312638
2 cell_avg 4231245715
Afterwards
No cell_avg 134418
1 cell_avg 344174
2 cell_avg 407769
When running spectral mode, with non-affine geometries,
cell_avg
is not lifted out of the quad loops, with a result that it is computed at every quad point.cell_avg
on test and trial spaces is even worse.Example:
In contrast, in vanilla mode: