cell_avg breaks spectral optimiser

wence- commented 5 years ago

When running spectral mode, with non-affine geometries, cell_avg is not lifted out of the quad loops, with a result that it is computed at every quad point. cell_avg on test and trial spaces is even worse.

Example:

from firedrake import *                                                                              
from firedrake.tsfc_interface import compile_form                                                    

import loopy                                                                                         

n = 6                                                                                                
mesh_order = 2                                                                                       
quadrature_degree = 3                                                                                
mesh = UnitCubeMesh(n, n, n)                                                                         
V = VectorFunctionSpace(mesh, "CG", mesh_order)                                                      
x, y, z = SpatialCoordinate(mesh)                                                                    
T = Function(V).interpolate(0.5 * as_vector([x+x**2, y+y**2, z+z**2]))                               
mesh = Mesh(T)                                                                                       
V = VectorFunctionSpace(mesh, "CG", 2)                                                               
u = TrialFunction(V)                                                                                 
v = TestFunction(V)                                                                                  

a1 = inner(div(u), div(v)) * dx(degree=quadrature_degree)                                            

a2 = inner(cell_avg(div(u)), div(v)) * dx(degree=quadrature_degree)                                  

a3 = inner(cell_avg(div(u)), cell_avg(div(v))) * dx(degree=quadrature_degree)                        

parameters = {"mode": "spectral"}                                                                    
k1, = compile_form(a1, "foo", parameters=parameters)                                                 
k2, = compile_form(a2, "foo", parameters=parameters)                                                 
k3, = compile_form(a3, "foo", parameters=parameters)                                                 

print("No cell_avg", k1.kinfo.kernel.num_flops)                                                      
print("1 cell_avg ", k2.kinfo.kernel.num_flops)                                                      
print("2 cell_avg ", k3.kinfo.kernel.num_flops) 

=>

No cell_avg 35685
1 cell_avg  360670
2 cell_avg  4041732

In contrast, in vanilla mode:

No cell_avg 22420
1 cell_avg  45756
2 cell_avg  65347

miklos1 commented 5 years ago

Is this a refactorisation or a (loopy) code generation issue?

wence- commented 5 years ago

I didn't get to the bottom of it. The cell avg code gets scheduled in the wrong part of the loop nest. That it doesn't happen in vanilla mode suggests it's something to do with refactorisation (issue appears in coffee mode too)

wence- commented 5 years ago

It's not a (loopy-specific) code gen issue: the same problem exhibits when running in coffee mode.

wence- commented 4 years ago

After some more digging. I think the problem is that the refactorisation routines just don't like when you have terms that are actually complete integrals.

One plausible way to fix this, and I think this is a general solution which we might need for other things like embedding entity-wise interpolations inside forms, is to treat these "special" UFL->GEM translations as complete integrand expressions.

You can then apply all the technology term-wise (so the translator pass that turns cell_avg(foo) -> gem_expression also applies the selected mode's optimisation pass).

If we hang metadata on this resulting node (such that we say "always classify this expression as OTHER so it is not refactorised), then things fall out.

I hacked this up for this particular case:

Consider

import loopy
from pyop2.datatypes import ScalarType
from tsfc import compile_form
from ufl import (FunctionSpace, Mesh, TensorProductCell, TestFunction,
                 TrialFunction, VectorElement, cell_avg, div, dx, inner,
                 interval, quadrilateral)

mesh_order = 2
quadrature_degree = 3

cell = TensorProductCell(quadrilateral, interval)
V = VectorElement("Q", cell, mesh_order)
mesh = Mesh(V)

V = FunctionSpace(mesh, VectorElement("Q", cell, mesh_order))

u = TrialFunction(V)
v = TestFunction(V)
a1 = inner(div(u), div(v)) * dx(degree=quadrature_degree)
a2 = inner(cell_avg(div(u)), div(v)) * dx(degree=quadrature_degree)
a3 = inner(cell_avg(div(u)), cell_avg(div(v))) * dx(degree=quadrature_degree)
parameters = {"mode": "spectral"}

def flops(k):
    op_map = loopy.get_op_map(
        k.copy(options=loopy.Options(ignore_boostable_into=True),
               silenced_warnings=['insn_count_subgroups_upper_bound',
                                  'get_x_map_guessing_subgroup_size',
                                  'summing_if_branches_ops']),
        subgroup_size='guess')
    return op_map.filter_by(name=['add', 'sub', 'mul', 'div'],
                            dtype=[ScalarType]).eval_and_sum({})

k1, = compile_form(a1, parameters=parameters, coffee=False)
print("No cell_avg", flops(k1.ast))
k2, = compile_form(a2, parameters=parameters, coffee=False)
print("1 cell_avg ", flops(k2.ast))
k3, = compile_form(a3, parameters=parameters, coffee=False)
print("2 cell_avg ", flops(k3.ast))

Before my changes:

No cell_avg 134418
1 cell_avg  17312638
2 cell_avg  4231245715

Afterwards

No cell_avg 134418
1 cell_avg  344174
2 cell_avg  407769

firedrakeproject / tsfc

cell_avg breaks spectral optimiser #196