At present, we generate a separate loop nest for each coefficient we need to evaluate at the quadrature points (a loop over dimensions per rank and a loop over basis functions). These loops could be fused as far as possible. Coefficients of equal rank could be trivially evaluated in the same loop nest.
We need to keep in mind though that this can have performance implications (e.g. cache performance) depending on the layout of the coefficients in memory. Loop fusion may hence even have a negative impact on performance.
At present, we generate a separate loop nest for each coefficient we need to evaluate at the quadrature points (a loop over dimensions per rank and a loop over basis functions). These loops could be fused as far as possible. Coefficients of equal rank could be trivially evaluated in the same loop nest.
We need to keep in mind though that this can have performance implications (e.g. cache performance) depending on the layout of the coefficients in memory. Loop fusion may hence even have a negative impact on performance.