firedrakeproject / firedrake

Firedrake is an automated system for the portable solution of partial differential equations using the finite element method (FEM)
https://firedrakeproject.org
Other
521 stars 160 forks source link

JDBetteridge/update caching #3730

Closed JDBetteridge closed 1 month ago

JDBetteridge commented 3 months ago

Description

Necessary caches replaced with parallel safe ones.

Depends on PyOP2 #724 going in first.

The PyOP2 caches are used in 3 key places:

The tests have been updated to run with the new PYOP2_SPMD_STRICT environment variable set to 1. Now everything passes without deadlocking, with the exception of two tests utilising ngsPETSc, which requires an upstream fix.

Testing performance has been significantly improved, but this isn't apparent on CI (I'm still not 100% sure why). Results are as follows:

Commands

Run on my workstation:

firedrake-clean; yes | pyop2-clean
time env PYOP2_CACHE_INFO=1 mprof run --include-children pytest -n 16 --dist=worksteal --durations 20 -v tests; mprof peak

Current Firedrake master

(export REV="Old") cold Output:

=== slowest 20 durations ====================================================================================
141.19s call     tests/regression/test_fdm.py::test_p_independence_hcurl[Box]
116.33s call     tests/external_operators/test_external_operators_adjoint.py::test_translation_operator_inverse_problem
109.21s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[sparse]
104.67s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity_parallel[per_cell]
101.03s call     tests/regression/test_fdm.py::test_p_independence_hdiv[Box]
95.72s call     tests/demos/test_notebooks_run.py::test_notebook_runs[10-sum-factorisation.ipynb]
80.05s call     tests/slate/test_slate_hybridization_extr.py::test_hybrid_extr_helmholtz[True]
77.77s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[dense]
74.48s call     tests/demos/test_notebooks_run.py::test_notebook_runs[09-hybridisation.ipynb]
72.14s call     tests/regression/test_stress_els.py::test_stress_displacement_convergence[conforming]
70.72s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[per_cell]
68.52s call     tests/demos/test_notebooks_run.py::test_notebook_runs[12-HPC_demo.ipynb]
63.96s call     tests/regression/test_interpolate_cross_mesh.py::test_interpolate_cross_mesh[spheresphere]
57.30s call     tests/regression/test_fdm.py::test_ipdg_direct_solver[rt-Box]
53.57s call     tests/regression/test_linesmoother.py::test_linesmoother[petscasm-Quad]
50.84s call     tests/regression/test_fdm.py::test_p_independence_hgrad[Box-fdm]
49.87s call     tests/multigrid/test_hiptmair.py::test_hiptmair_hcurl[hexahedron-aij]
49.66s call     tests/regression/test_fdm.py::test_p_independence_hcurl[Rectangle]
47.91s call     tests/slate/test_slate_hybridization_extr.py::test_hybrid_extr_helmholtz[False]
47.69s call     tests/slate/test_hdg_poisson.py::test_hdg_convergence[1-True-1.75]

real    20m33.434s
user    481m20.481s
sys     49m7.885s

mprofile_20240823224837.dat     43321.184 MiB

Old_cold

This branch

(export REV="Experiment") cold Output:

=== slowest 20 durations ====================================================================================
114.55s call     tests/external_operators/test_external_operators_adjoint.py::test_translation_operator_inverse_problem
112.40s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[sparse]
110.55s call     tests/regression/test_fdm.py::test_p_independence_hcurl[Box]
87.76s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity_parallel[per_cell]
85.13s call     tests/demos/test_notebooks_run.py::test_notebook_runs[10-sum-factorisation.ipynb]
78.62s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[dense]
75.13s call     tests/regression/test_fdm.py::test_p_independence_hdiv[Box]
71.45s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity[per_cell]
68.81s call     tests/demos/test_notebooks_run.py::test_notebook_runs[12-HPC_demo.ipynb]
64.85s call     tests/equation_bcs/test_equation_bcs.py::test_EquationBC_mixedpoisson_matfree_fieldsplit
64.35s call     tests/regression/test_stress_els.py::test_stress_displacement_convergence[conforming]
64.03s call     tests/slate/test_slate_hybridization_extr.py::test_hybrid_extr_helmholtz[True]
62.70s call     tests/regression/test_interpolate_cross_mesh.py::test_interpolate_cross_mesh[spheresphere]
55.62s call     tests/equation_bcs/test_equation_bcs.py::test_EquationBC_poisson_matfree[True]
54.31s call     tests/demos/test_notebooks_run.py::test_notebook_runs[09-hybridisation.ipynb]
53.12s call     tests/regression/test_fdm.py::test_ipdg_direct_solver[rt-Box]
43.07s call     tests/output/test_io_function.py::test_io_function_base[cell_family_degree21]
39.91s call     tests/regression/test_linesmoother.py::test_linesmoother[petscasm-Quad]
39.38s call     tests/vertexonly/test_poisson_inverse_conductivity.py::test_poisson_inverse_conductivity_parallel[dense]
38.17s call     tests/regression/test_interpolate_cross_mesh.py::test_interpolate_cross_mesh_parallel[spheresphere]

real    15m9.087s
user    330m19.724s
sys     38m52.005s

mprofile_20240823234054.dat     31183.383 MiB

Experiment_cold

I also ran with a warm cache with similar levels of improvement, but this isn't as relevant (to CI).

github-actions[bot] commented 2 months ago
TestsPassed ✅Skipped ⏭️Failed ❌
Firedrake complex8050 ran6468 passed1582 skipped0 failed
github-actions[bot] commented 2 months ago
TestsPassed ✅Skipped ⏭️Failed ❌
Firedrake real8056 ran7270 passed786 skipped0 failed
JDBetteridge commented 1 month ago

I think this one is ready to go (once tests pass of course!)