Open lindsayad opened 2 weeks ago
Try a petsc garbage cleanup after the gc.collect as well, I think.
I tried adding all the petsc garbage cleanups
PETSc.garbage_cleanup(PETSc.COMM_SELF)
PETSc.garbage_cleanup(mesh._comm)
gc.collect()
PETSc.garbage_cleanup(PETSc.COMM_SELF)
PETSc.garbage_cleanup(mesh._comm)
if __name__ == "__main__":
run()
PETSc.garbage_cleanup(PETSc.COMM_SELF)
gc.collect()
PETSc.garbage_cleanup(PETSc.COMM_SELF)
and the result is the same
I suspect the problem is still garbage collection. Python just doesn't provide any guarantees about when reference cycles will be cleared, and if that's only while the interpreter is getting pulled apart, then in Parallel you will still have leaked objects.
PETSc.garbage_cleanup(PETSc.COMM_SELF)
will do nothing. I contributed some code upstream that means that we only defer the destruction of objects whose communicator has size greater than 1.
To me it seems feasible that we could be caching PETSc objects in some of our global caches that only get cleared up at interpreter shutdown.
An extra cleanup is needed after the final garbage collection.
...
return mesh._comm
if __name__ == "__main__":
comm = run()
gc.collect()
PETSc.garbage_cleanup(comm)
PETSc.garbage_cleanup(PETSc.COMM_SELF)
will do nothing. I contributed some code upstream that means that we only defer the destruction of objects whose communicator has size greater than 1.To me it seems feasible that we could be caching PETSc objects in some of our global caches that only get cleared up at interpreter shutdown.
PETSc.garbage_cleanup(PETSc.COMM_SELF)
will do nothing. I contributed some code upstream that means that we only defer the destruction of objects whose communicator has size greater than 1.To me it seems feasible that we could be caching PETSc objects in some of our global caches that only get cleared up at interpreter shutdown.
I think I did a round of pulling all of those out, so there are only "Object-cached" things that live for the lifetime of the process.
But, we absolutely have refcycles in the firedrake objects, so to clean things up one does need gc.collect()
followed by garbage_collect
on the relevant communicator.
As @jrmaddison notes, it is insufficient to call collect
at the end of the run
function, because the references to firedrake objects are still live. Without explicitly deleting (via del
) the names, they don't go out of scope until the function exits. So one must send the communicator out of the run
function, and then do as James suggests.
Changing to
solve(a == L, w, solver_parameters=parameters)
return mesh._comm
if __name__ == "__main__":
comm = run()
gc.collect()
PETSc.garbage_cleanup(comm)
does indeed resolve the issue, thanks! I think it woud be nice to incorporate this into documentation examples. Not many users may run with -log_view
, but this also removes warnings like yaksa: X leaked handle pool objects
Describe the bug
Steps to Reproduce Here is the example I'm running
Command:
Expected behavior I expect the number of destructions to match the number of creations. They do match when run in serial
Environment: