firedrakeproject / firedrake

Firedrake is an automated system for the portable solution of partial differential equations using the finite element method (FEM)
https://firedrakeproject.org
Other
514 stars 160 forks source link

Memory leaks #227

Closed ctjacobs closed 10 years ago

ctjacobs commented 10 years ago

Even with 16 GB of memory for one 8-core processor and a 10,000 node mesh, my Firedrake runs on CX1 keep ending with an "Out of memory" error after a few hundred timesteps. Since issue #176, I have moved all FunctionSpace instantiation out of the time-stepping loop. The valgrind output (https://gist.github.com/ctjacobs/bedbef9e522d261ee0d6) for two timesteps in a serial run of the swe_mms_p2p1 test in firedrake-fluids shows a loss of about 10 KB for a 5 x 5 UnitSquareMesh, but this grows with the number of timesteps.

kynan commented 10 years ago

We're aware of the issue, but it's not trivial to fix unfortunately. The problem is we're using Python subprocess to call compiler and preprocessor, which does a POSIX fork, doubling the memory footprint of the process. This is not a problem early on while the memory footprint is still small, but is devastating later on.

dham commented 10 years ago

Hang on, this doesn't add up. Surely the issue is the original process leaking memory. That the memory spikes briefly when fork is called is a separate issue (also copy on write pages should ensure that the latter is not an issue). If fork were essentially a memory leak, very few program's would work.

kynan commented 10 years ago

It's certainly true that the ultimate cause is that we're leaking memory. However the symptom we observed before was an "Out of memory" when trying to fork in subprocess since, even though the process may never claim all this memory, the OS refuses to fork a process if there is not enough free memory. We saw this on foraker when @doru1004 was running with an extruded mesh and the run consistently failed in subprocess as soon as it had passed 50% of available RAM.

dham commented 10 years ago

I think we're still not at the bottom of this. Linux usually happily overcommits on malloc so that particular problem is rather odd ( Linux can be told not to overcommits, which sometimes is set on supercomputers but not usually on workstations).

However, we REALLY shouldn't be leaking memory. We thought that dropping Fluidity would deal with this. Do we have a clue what is leaking now?

wence- commented 10 years ago

I agree that the memory leak is our problem. I will have a look.

wence- commented 10 years ago

OK, I think I have it. It's a memory leak in pyop2. In particular in the caching of JIT modules.

What's wrong with this class:


class JITModule(Cached):
     def __init__(self, kernel, itspace, *args, **kwargs):
         ...
         self._args = args
         ...

Where args is a list of parloop arguments.

That's right, the jit module is cached, and, because it references is the parloop args, which reference dats and mats, those latter objects are never collected because the cache is holding an indirect reference to them. I'll propose a fix.

dham commented 10 years ago

Ah, this is our old friend the conflation of data and metadata in PyOP2. I think the definitive answer to this is to make that split happen. But that is significant work so I'm all ears for a quick fix.

wence- commented 10 years ago

I believe this is fixed in OP2/PyOP2#346, as demonstration, consider the following:

from firedrake import *
op2.init(log_level='WARNING')

mesh = UnitSquareMesh(5, 5)

BDM = FunctionSpace(mesh, "BDM", 1)
DG = FunctionSpace(mesh, "DG", 0)
W = BDM * DG

# Define trial and test functions
sigma, u = TrialFunctions(W)
tau, v = TestFunctions(W)

# Define source function
f = Function(DG).assign(0)

# Define variational form
a = (dot(sigma, tau) + div(tau)*u + div(sigma)*v)*dx
n = FacetNormal(mesh)
L = -f*v*dx + 42*dot(tau, n)*ds(4)

# Apply dot(sigma, n) == 0 on left and right boundaries strongly
# (corresponding to Neumann condition du/dn = 0)
bcs = DirichletBC(W.sub(0), Expression(('0', '0')), (1, 2))

t = 0
dt = 0.1
T = 20
w = Function(W)
wold = Function(W)
while t < T:
    wold.assign(w)
    # Compute solution
    solve(a == L, w, bcs=bcs)
    t += dt

import gc
def howmany(cls):
    return len([x for x in gc.get_objects() if isinstance(x, cls)])

gc.collect()
gc.collect()

print howmany(op2.Dat), howmany(op2.Mat)

With PyOP2 master this prints (for me):

615 6

With pyop2 fix/memory_leak

9 0
wence- commented 10 years ago

Christian, if you update to latest PyOP2 master, I think this problem should be fixed. Can you have a try please?

wence- commented 10 years ago

Ok. The reason we are still leaking is cos we're leaking sparsities like crazy. This is twofold.

  1. Pyop2 mixed types implement == correctly but bit hash, so using them as a cache key is a disaster because they are eq but don't gash the same and so we fail to find things in the sparsity cache.
  2. The new sparsitymap stuff makes this worse. Because we now don't even cache sparsities in the non mixed case.

I think I can fix this but need to think harder about it.

dham commented 10 years ago

For added fun, this is currently completely screwing the assembly cache, since we miss cache on the sparsity assembler even if we hit it on the assembly.

wence- commented 10 years ago

Having thought a bit more, I think we should do this the same way we now cache function spaces in firedrake. In particular:

SparsityMaps should be cached on the Map they're built on top of.

MixedSets should be cached on all the sets they're built on (keyed by the tuple of sets). MixedDataSets should be cached on the MixedSet they're built on.

MixedMaps should be cached on all the maps they're built on (keyed by the tuple of maps).

While we're here, we could probably cache DataSets on the Set they're built on too.

Does this sound plausible? I realise it adds yet more caching complexity to the whole pyop2 base layer, but maybe it's ok.

kynan commented 10 years ago

I think that sounds entirely plausible. Probably the complexity (at least in terms of code duplication) could be mitigated by making the Cached base class smarter and allow caching an object on another object.

While on that, maybe we can also refactor the disk caching infrastructure and make DiskCached smarter too s.t. the FFC kernel cache (now in Firedrake) can benefit from the same protection against race conditions etc. in the MPI case used by the Compiler.

wence- commented 10 years ago

Ok I think I now have this working. Merge request to arrive Monday.

wence- commented 10 years ago

With PyOP2/OP2#351 and #239, I claim fixed. Apart from the fact that we still leak make millions of kernels, I think PyOP2/OP2#347 fixes that one.

wence- commented 10 years ago

Christian, can you check with latest firedrake/pyop2 master to check everything is alright?

wence- commented 10 years ago

Also, please do runs with export PYOP2_PRINT_CACHE_SIZE=1 so that at the end of the run, we get a print out of the still in cache objects (that should give us an idea if there are more holes to plug).

ctjacobs commented 10 years ago

This now seems fine in serial. I ran the MMS test for 2 and 4 timesteps - only the pyop2.host.Kernel and pyop2.sequential.JITModule objects increase in numbers:

26 ObjectCached objects in caches
Object breakdown
================
pyop2.base.SparsityMap: 2
pyop2.base.DataSet: 8
firedrake.types.FunctionSpace: 2
pyop2.base.MixedSet: 1
firedrake.types.VectorFunctionSpace: 1
firedrake.types.FunctionSpace: 1
firedrake.types.FunctionSpace: 1
pyop2.base.Sparsity: 5
firedrake.types.MixedFunctionSpace: 1
pyop2.base.MixedMap: 2
pyop2.base.MixedDataSet: 1
firedrake.types.FunctionSpace: 1

183 Cached objects in caches
Object breakdown
================
pyop2.host.Kernel: 86
firedrake.ffc_interface.FFCKernel: 23
pyop2.sequential.JITModule: 74
26 ObjectCached objects in caches
Object breakdown
================
pyop2.base.SparsityMap: 2
pyop2.base.DataSet: 8
firedrake.types.FunctionSpace: 2
pyop2.base.MixedSet: 1
firedrake.types.VectorFunctionSpace: 1
firedrake.types.FunctionSpace: 1
firedrake.types.FunctionSpace: 1
pyop2.base.Sparsity: 5
firedrake.types.MixedFunctionSpace: 1
pyop2.base.MixedMap: 2
pyop2.base.MixedDataSet: 1
firedrake.types.FunctionSpace: 1

279 Cached objects in caches
Object breakdown
================
pyop2.host.Kernel: 134
firedrake.ffc_interface.FFCKernel: 23
pyop2.sequential.JITModule: 122

Issue #233 needs to be resolved before I can try running the simulation in parallel.

wence- commented 10 years ago

233 has been addressed, I think this should also be fixed in parallel, hence closing. Please reopen if you still have issues.