firedrakeproject / firedrake

Firedrake is an automated system for the portable solution of partial differential equations using the finite element method (FEM)
https://firedrakeproject.org
Other
509 stars 159 forks source link

BUG: Regression in Performance after Installing Firedrake #3030

Closed ryan-david-murphy closed 1 year ago

ryan-david-murphy commented 1 year ago

Description: I encountered a performance regression after successfully installing Firedrake. Although the installation was completed without errors (after pinning the cython version to 0.29.36), the performance has noticeably dropped.

Steps to Reproduce:

  1. Install Firedrake using the installation script.
  2. Run a sample code (e.g., helmholtz.py) multiple times to observe the performance.

Expected Behaviour: The performance should be consistent or improved compared to the previous environment.

Actual Behavior: The performance has significantly dropped after installing Firedrake.

Environment: Operating System: MacOS 13.4.1 Python Version: 3.10.8 Firedrake Version: 0.13.0+5767.g32bda80fc

JDBetteridge commented 1 year ago

Uploading so we don't lose this: foo

JDBetteridge commented 1 year ago

Assuming this is still an issue, could you try a separate fresh install (please download the latest version of the install script!) now that we have pinned Cython. It's possible that the slow init is a result of some packages being installed with latest Cython and some with older Cython.

I have not been able to reproduce this issue locally.

ryan-david-murphy commented 1 year ago

I have reinstalled using the updated firedrake-install script after I completely removed the previous venv. I have also uninstalled and reinstalled homebrew and then completed a further reinstallation. The same performance issue is present.

I have run helmholtz.py (with graph plotting removed) using both my M1 Max and a Linux Workstation (3 month+ old venv) for comparison. I have attached the profiles. They are usually of comparable performance.

Is there anything else I can reinstall to enable a fresh implementation?

fooLinux fooM1Max

JDBetteridge commented 1 year ago

If you update (or do a fresh install) on the Linux workstation do you also see the performance regression? If you don't want to risk losing the old performant venv you can use firedrake-install --venv-name somthing_unique. If the Linux workstation is fine I will add the Mac tag and get some of our Mac developers to investigate.

JDBetteridge commented 1 year ago

I will say that the profiles do look very similar to a first run (doing code gen) vs second run (using cached code).

The Helmholtz example (in the demos directory) is also very small, only a 10x10 grid with CG1 elements. To get meaningful profiling data we need to increase the number of dofs. Maybe you could add some timings?

I have attached an example profiling test on my desktop along with its output for comparison:

test_script.sh:

#!/bin/bash

# Clean caches
firedrake-clean

# Create a minimal Helmholtz problem (without plotting)
cat <<EOF >minimal_helmholtz.py
from firedrake import *

mesh = UnitSquareMesh(10, 10)

V = FunctionSpace(mesh, "CG", 1)
u = TrialFunction(V)
v = TestFunction(V)

f = Function(V)
x, y = SpatialCoordinate(mesh)
f.interpolate((1+8*pi*pi)*cos(x*pi*2)*cos(y*pi*2))

a = (inner(grad(u), grad(v)) + inner(u, v)) * dx
L = inner(f, v) * dx

u = Function(V)

solve(a == L, u, solver_parameters={'ksp_type': 'cg', 'pc_type': 'none'})

File("helmholtz.pvd").write(u)

f.interpolate(cos(x*pi*2)*cos(y*pi*2))
print(sqrt(assemble(dot(u - f, u - f) * dx)))
EOF

# Time and profile minimal Helmholtz
echo "10x10 cold cache"
time python minimal_helmholtz.py -log_view :no_cache_profile.txt:ascii_flamegraph
flamegraph.pl no_cache_profile.txt > no_cache_profile.svg

# Time and profile minimal Helmholtz with hot cache
echo "10x10 hot cache"
time python minimal_helmholtz.py -log_view :hot_cache_profile.txt:ascii_flamegraph
flamegraph.pl hot_cache_profile.txt > hot_cache_profile.svg

# Increase problem size
sed -i "s/(10, 10)/(1000, 1000)/g" minimal_helmholtz.py

# Run bigger problem
echo "1000x1000 hot cache"
time python minimal_helmholtz.py -log_view :big_hot_cache_profile.txt:ascii_flamegraph
flamegraph.pl big_hot_cache_profile.txt > big_hot_cache_profile.svg

output:

$ ./test_script.sh 
/home/jack/Documents/firedrake/firedrake/bin/firedrake-clean:4: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  __import__('pkg_resources').require('firedrake==0.13.0+5774.g3fb16ad47.dirty')
Removing cached TSFC kernels from /home/jack/Documents/firedrake/firedrake/.cache/tsfc
Removing cached PyOP2 code from /home/jack/Documents/firedrake/firedrake/.cache/pyop2
Removing cached pytools files from /home/jack/.cache/pytools
10x10 cold cache
0.06257073749110136

real    0m4.426s
user    0m4.085s
sys 0m0.333s
10x10 hot cache
0.06257073749110136

real    0m1.387s
user    0m1.218s
sys 0m0.155s
1000x1000 hot cache
7.078431517196732e-06

real    1m9.689s
user    0m41.024s
sys 0m28.647s

Cold cache: no_cache_profile Hot cache: hot_cache_profile Big problem: big_hot_cache_profile

JDBetteridge commented 1 year ago

@rdm4317 any update?

ryan-david-murphy commented 1 year ago

@JDBetteridge I have run the requested profiles, here are the results:

Mac:

10x10 cold cache real 0m22.615s user 0m4.705s sys 0m2.675s

10x10 hot cache real 0m4.334s user 0m1.390s sys 0m0.868s

1000x1000 hot cache real 0m38.126s user 0m35.405s sys 0m1.232s

10x10 cold cache no_cache_profile

10x10 hot cache hot_cache_profile

1000x1000 hot cache big_hot_cache_profile

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |master                        |d230953b  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |master                        |3fb16ad47 |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|slepc               |firedrake                     |e438e4993 |False     |
|tsfc                |master                        |6f72c9c   |False     |
|ufl                 |master                        |3c62318c  |False     |

Linux WS:

10x10 cold cache real 0m6.582s user 0m5.707s sys 0m0.884s

10x10 hot cache real 0m2.300s user 0m1.768s sys 0m0.554s

1000x1000 hot cache real 0m51.228s user 0m49.202s sys 0m1.973s

10x10 cold cache no_cache_profile

10x10 hot cache hot_cache_profile

1000x1000 hot cache big_hot_cache_profile

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|COFFEE              |master                        |70c1e66   |False     |
|FInAT               |master                        |cd1d528   |False     |
|PyOP2               |master                        |59e109eb  |False     |
|fiat                |master                        |a305398   |False     |
|firedrake           |master                        |284a1104a |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |69012e5   |False     |
|loopy               |main                          |3988272b  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |c691737   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |master                        |e68bd28   |False     |
|ufl                 |master                        |772485d7  |False     |
---------------------------------------------------------------------------
ryan-david-murphy commented 1 year ago

For a simple hyperelasticity example, I am getting different TSFC behaviours.

Mac:

0
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
1
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
2
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
3
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
4
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
5
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
6
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
7
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
8
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
9
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)

Linux:

0
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
1
2
3
4
5
6
7
8
9

Here is the code:

from firedrake import *

spatialDimensions = 2
lx = 8
ly = 1
nx = 320
ny = 40
mesh = RectangleMesh(nx, ny, lx, ly, quadrilateral=True)

# function spaces
A = FunctionSpace(mesh, "CG", 1)
P = VectorFunctionSpace(mesh, "CG", 1)

# boundary conditions
bcs = [DirichletBC(P.sub(0), Constant(0), 1),
       DirichletBC(P.sub(1), Constant(0), 1)]

# Define functions
du = TrialFunction(P)            # Incremental displacement
v  = TestFunction(P)             # Test function
u  = Function(P)                 # Displacement from previous iteration
B  = Constant((0.0, -0.0))  # Body force per unit volume
T  = Constant((0.1,  0.0))  # Traction force on the boundary

for i in range(10):
    print(i)

    # Kinematics
    I = Identity(2)             # Identity tensor
    F = I + grad(u)             # Deformation gradient
    C = F.T*F                   # Right Cauchy-Green tensor

    # Invariants of deformation tensors
    Ic = tr(C)
    J  = det(F)

    # Elasticity parameters
    E, nu = 10.0, 0.3
    mu, lmbda = Constant(E/(2*(1 + nu))), Constant(E*nu/((1 + nu)*(1 - 2*nu)))

    # Stored strain energy density (compressible neo-Hookean model)
    psi = (mu/2)*(Ic - 3) - mu*ln(J) + (lmbda/2)*(ln(J))**2

    # Total potential energy
    Pi = psi*dx - dot(B, u)*dx - dot(T, u)*ds(2)

    # Compute first variation of Pi (directional derivative about u in the direction of v)
    F = derivative(Pi, u, v)

    # Compute Jacobian of F
    J = derivative(F, u, du)

    # Solve variational problem
    problem = NonlinearVariationalProblem(F, u, bcs=bcs, J=J)
    solver = NonlinearVariationalSolver(problem)
    solver.solve()
Ig-dolci commented 1 year ago

I reproduced this execution with test.sh at M2 Mac. See the results:

10x10 cold cache real 0m7.179s user 0m4.232s sys 0m1.043s

10x10 hot cache real 0m2.460s user 0m1.421s sys 0m0.542s

1000x1000 hot cache real 0m38.433s user 0m35.422s sys 0m2.266s

10x10 cold cache no_cache_profile

10x10 hot cache hot_cache_profile

1000x1000 hot cache big_hot_cache_profile

I had tsfc:WARNING only once.

ksagiyam commented 1 year ago

My intel Mac Monterey 12.4 (Fresh install):

10x10 cold cache
0.06257073749110047

real    0m21.021s
user    0m9.369s
sys 0m4.045s
10x10 hot cache
0.06257073749110047

real    0m9.233s
user    0m4.369s
sys 0m2.088s
1000x1000 hot cache
7.078429874707133e-06

real    1m36.637s
user    1m31.152s
sys 0m3.195s

10x10 no cache: no_cache_profile 10x10 hot cache: hot_cache_profile 1000x1000 hot cache: big_hot_cache_profile

Hyperelasticity example:

tsfc warnings at every step

firedrake-status:

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |master                        |d230953b  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |master                        |0ec02b2d8 |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |master                        |6f72c9c   |False     |
|ufl                 |master                        |3c62318c  |False     |
---------------------------------------------------------------------------
ksagiyam commented 1 year ago

On my linux machine (Fresh install):

10x10 no cache: no_cache_profile 10x10 hot cache: hot_cache_profile 1000x1000 hot cache: big_hot_cache_profile

Hyperelasticity example:

tsfc warnings at every step

firedrake-status:

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |master                        |d230953b  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |master                        |0ec02b2d8 |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |master                        |6f72c9c   |False     |
|ufl                 |master                        |3c62318c  |False     |
---------------------------------------------------------------------------
ksagiyam commented 1 year ago

I see tsfc warning at each step both on my mac and on my Linux machine. It looks more like an issue of the latest Firedrake than macos vs. Linux to me.

Can everyone please put the output of firedrake-status below your test result?

ryan-david-murphy commented 1 year ago

@ksagiyam I have updated my post with this output

Ig-dolci commented 1 year ago

|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|COFFEE              |master                        |70c1e66   |False     |
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |master                        |d230953b  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |master                        |0ec02b2d8 |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |master                        |6f72c9c   |False     |
|ufl                 |master                        |3c62318c  |False     |
---------------------------------------------------------------------------
ksagiyam commented 1 year ago

Testing on my Linux machine indicates that this PR on Constant https://github.com/firedrakeproject/firedrake/pull/2927 somehow broke the caching. (Firedrake + PyOP2 + tsfc)

I used the above hyperelasticity problem as an example.

Right before https://github.com/firedrakeproject/firedrake/pull/2927:

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|COFFEE              |master                        |70c1e66   |False     |
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |HEAD                          |edae2884  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |HEAD                          |be82caf4e |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |HEAD                          |ef39f72   |False     |
|ufl                 |master                        |3c62318c  |False     |
---------------------------------------------------------------------------

Cold cache:

tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
WARNING:tsfc:Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
1
2
3

Hot cache:

0
1
2
3

Right after https://github.com/firedrakeproject/firedrake/pull/2927:

---------------------------------------------------------------------------
|Package             |Branch                        |Revision  |Modified  |
---------------------------------------------------------------------------
|COFFEE              |master                        |70c1e66   |False     |
|FInAT               |master                        |47f6c37   |False     |
|PyOP2               |HEAD                          |d230953b  |False     |
|fiat                |master                        |8c66270   |False     |
|firedrake           |HEAD                          |34f930dd9 |False     |
|h5py                |firedrake                     |6cc4c912  |False     |
|libspatialindex     |master                        |4768bf3   |True      |
|libsupermesh        |master                        |b145b65   |False     |
|loopy               |main                          |8158afdb  |False     |
|petsc               |firedrake                     |9364cb008b|False     |
|pyadjoint           |master                        |0378c81   |False     |
|pytest-mpi          |main                          |a478bc8   |False     |
|tsfc                |HEAD                          |83dd8aa   |False     |
|ufl                 |master                        |3c62318c  |False     |
---------------------------------------------------------------------------

Cold cache:

0
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
1
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
2
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
3
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)

Hot cache:

0
1
2
3
ryan-david-murphy commented 1 year ago

Hey @ksagiyam, did you work out how to fix this?

connorjward commented 1 year ago

Sorry I have been on holiday for the past two weeks so haven't seen this. I think that this is a known performance problem with the recent changes to how we use Constants. Could you check whether using Firedrake branch connorjward/fix-constant-numbering and UFL branch connorjward/counted-mixin makes these go away? I already have associated PRs (Firedrake, UFL) for getting these fixes in.

ksagiyam commented 1 year ago

Yes, those branches at least fix the problem stated above.

Cold cache:

0
tsfc:WARNING Estimated quadrature degree 14 more than tenfold greater than any argument/coefficient degree (max 1)
1
2
3

Hot cache:

0
1
2
3
connorjward commented 1 year ago

Closing this issue as I believe it is fixed by https://github.com/firedrakeproject/firedrake/pull/3011. Please reopen it if this is not the case.

ryan-david-murphy commented 1 year ago

Thanks @connorjward, I have just updated my installation and the performance is much improved.