Closed ctjacobs closed 10 years ago
We're aware of the issue, but it's not trivial to fix unfortunately. The problem is we're using Python subprocess to call compiler and preprocessor, which does a POSIX fork
, doubling the memory footprint of the process. This is not a problem early on while the memory footprint is still small, but is devastating later on.
Hang on, this doesn't add up. Surely the issue is the original process leaking memory. That the memory spikes briefly when fork is called is a separate issue (also copy on write pages should ensure that the latter is not an issue). If fork were essentially a memory leak, very few program's would work.
It's certainly true that the ultimate cause is that we're leaking memory. However the symptom we observed before was an "Out of memory" when trying to fork in subprocess since, even though the process may never claim all this memory, the OS refuses to fork a process if there is not enough free memory. We saw this on foraker when @doru1004 was running with an extruded mesh and the run consistently failed in subprocess as soon as it had passed 50% of available RAM.
I think we're still not at the bottom of this. Linux usually happily overcommits on malloc so that particular problem is rather odd ( Linux can be told not to overcommits, which sometimes is set on supercomputers but not usually on workstations).
However, we REALLY shouldn't be leaking memory. We thought that dropping Fluidity would deal with this. Do we have a clue what is leaking now?
I agree that the memory leak is our problem. I will have a look.
OK, I think I have it. It's a memory leak in pyop2. In particular in the caching of JIT modules.
What's wrong with this class:
class JITModule(Cached):
def __init__(self, kernel, itspace, *args, **kwargs):
...
self._args = args
...
Where args is a list of parloop arguments.
That's right, the jit module is cached, and, because it references is the parloop args, which reference dats and mats, those latter objects are never collected because the cache is holding an indirect reference to them. I'll propose a fix.
Ah, this is our old friend the conflation of data and metadata in PyOP2. I think the definitive answer to this is to make that split happen. But that is significant work so I'm all ears for a quick fix.
I believe this is fixed in OP2/PyOP2#346, as demonstration, consider the following:
from firedrake import *
op2.init(log_level='WARNING')
mesh = UnitSquareMesh(5, 5)
BDM = FunctionSpace(mesh, "BDM", 1)
DG = FunctionSpace(mesh, "DG", 0)
W = BDM * DG
# Define trial and test functions
sigma, u = TrialFunctions(W)
tau, v = TestFunctions(W)
# Define source function
f = Function(DG).assign(0)
# Define variational form
a = (dot(sigma, tau) + div(tau)*u + div(sigma)*v)*dx
n = FacetNormal(mesh)
L = -f*v*dx + 42*dot(tau, n)*ds(4)
# Apply dot(sigma, n) == 0 on left and right boundaries strongly
# (corresponding to Neumann condition du/dn = 0)
bcs = DirichletBC(W.sub(0), Expression(('0', '0')), (1, 2))
t = 0
dt = 0.1
T = 20
w = Function(W)
wold = Function(W)
while t < T:
wold.assign(w)
# Compute solution
solve(a == L, w, bcs=bcs)
t += dt
import gc
def howmany(cls):
return len([x for x in gc.get_objects() if isinstance(x, cls)])
gc.collect()
gc.collect()
print howmany(op2.Dat), howmany(op2.Mat)
With PyOP2 master this prints (for me):
615 6
With pyop2 fix/memory_leak
9 0
Christian, if you update to latest PyOP2 master, I think this problem should be fixed. Can you have a try please?
Ok. The reason we are still leaking is cos we're leaking sparsities like crazy. This is twofold.
I think I can fix this but need to think harder about it.
For added fun, this is currently completely screwing the assembly cache, since we miss cache on the sparsity assembler even if we hit it on the assembly.
Having thought a bit more, I think we should do this the same way we now cache function spaces in firedrake. In particular:
SparsityMaps should be cached on the Map they're built on top of.
MixedSets should be cached on all the sets they're built on (keyed by the tuple of sets). MixedDataSets should be cached on the MixedSet they're built on.
MixedMaps should be cached on all the maps they're built on (keyed by the tuple of maps).
While we're here, we could probably cache DataSets on the Set they're built on too.
Does this sound plausible? I realise it adds yet more caching complexity to the whole pyop2 base layer, but maybe it's ok.
I think that sounds entirely plausible. Probably the complexity (at least in terms of code duplication) could be mitigated by making the Cached
base class smarter and allow caching an object on another object.
While on that, maybe we can also refactor the disk caching infrastructure and make DiskCached
smarter too s.t. the FFC kernel cache (now in Firedrake) can benefit from the same protection against race conditions etc. in the MPI case used by the Compiler
.
Ok I think I now have this working. Merge request to arrive Monday.
With PyOP2/OP2#351 and #239, I claim fixed. Apart from the fact that we still leak make millions of kernels, I think PyOP2/OP2#347 fixes that one.
Christian, can you check with latest firedrake/pyop2 master to check everything is alright?
Also, please do runs with export PYOP2_PRINT_CACHE_SIZE=1 so that at the end of the run, we get a print out of the still in cache objects (that should give us an idea if there are more holes to plug).
This now seems fine in serial. I ran the MMS test for 2 and 4 timesteps - only the pyop2.host.Kernel
and pyop2.sequential.JITModule
objects increase in numbers:
26 ObjectCached objects in caches
Object breakdown
================
pyop2.base.SparsityMap: 2
pyop2.base.DataSet: 8
firedrake.types.FunctionSpace: 2
pyop2.base.MixedSet: 1
firedrake.types.VectorFunctionSpace: 1
firedrake.types.FunctionSpace: 1
firedrake.types.FunctionSpace: 1
pyop2.base.Sparsity: 5
firedrake.types.MixedFunctionSpace: 1
pyop2.base.MixedMap: 2
pyop2.base.MixedDataSet: 1
firedrake.types.FunctionSpace: 1
183 Cached objects in caches
Object breakdown
================
pyop2.host.Kernel: 86
firedrake.ffc_interface.FFCKernel: 23
pyop2.sequential.JITModule: 74
26 ObjectCached objects in caches
Object breakdown
================
pyop2.base.SparsityMap: 2
pyop2.base.DataSet: 8
firedrake.types.FunctionSpace: 2
pyop2.base.MixedSet: 1
firedrake.types.VectorFunctionSpace: 1
firedrake.types.FunctionSpace: 1
firedrake.types.FunctionSpace: 1
pyop2.base.Sparsity: 5
firedrake.types.MixedFunctionSpace: 1
pyop2.base.MixedMap: 2
pyop2.base.MixedDataSet: 1
firedrake.types.FunctionSpace: 1
279 Cached objects in caches
Object breakdown
================
pyop2.host.Kernel: 134
firedrake.ffc_interface.FFCKernel: 23
pyop2.sequential.JITModule: 122
Issue #233 needs to be resolved before I can try running the simulation in parallel.
Even with 16 GB of memory for one 8-core processor and a 10,000 node mesh, my Firedrake runs on CX1 keep ending with an "Out of memory" error after a few hundred timesteps. Since issue #176, I have moved all FunctionSpace instantiation out of the time-stepping loop. The valgrind output (https://gist.github.com/ctjacobs/bedbef9e522d261ee0d6) for two timesteps in a serial run of the swe_mms_p2p1 test in firedrake-fluids shows a loss of about 10 KB for a 5 x 5 UnitSquareMesh, but this grows with the number of timesteps.