TDycores-Project / TDycore

BSD 2-Clause "Simplified" License
4 stars 0 forks source link

Plugged a bunch of memory leaks. #222

Closed jeff-cohere closed 2 years ago

jeff-cohere commented 2 years ago

I ran Valgrind on the dycore and saw quite a large number of reported leaks. Some leaks come from before our big refactor, and some were introduced by it!

This PR fixes all of the leaks except for one, which is related to the dycore's primary DM. I'm not sure what's going on with it at the moment, but I'd rather get these other fixes in now.

We can probably continue to improve our organization (and slim the FV mesh down a bit--it's got a lot of things at the moment), but this at least gives us a cleaner report from Valgrind.

codecov-commenter commented 2 years ago

Codecov Report

Merging #222 (fb5abc9) into master (d2a915c) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #222   +/-   ##
=======================================
  Coverage   51.63%   51.63%           
=======================================
  Files           4        4           
  Lines         765      765           
=======================================
  Hits          395      395           
  Misses        370      370           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d2a915c...fb5abc9. Read the comment docs.

jedbrown commented 2 years ago

What should I run to reproduce the leak you don't understand?

jeff-cohere commented 2 years ago

I used this (from demo/th):

./th_driver -malloc 0 -successful_exit_code 0 -dm_plex_simplex 0 -dm_plex_dim 3 -dm_plex_box_faces 2,2,2 -dm_plex_box_lower 0,0,0 -dm_plex_box_upper 1,1,1 -tdy_water_density exponential -tdy_regression_test -tdy_regression_test_num_cells_per_process 2 -tdy_regression_test_filename th-driver-ts-prob1 -tdy_final_time 3.1536e3 -tdy_dt_max 600. -tdy_dt_growth_factor 1.5 -tdy_init_with_random_field -tdy_time_integration_method TS

and I pulled the last e3 off of -tdy_final_time to get a quicker turnaround. But I think it probably appears in several demos. I haven't looked at it very closely, but it appears that the DM originally created for the dycore isn't getting freed for some reason (even though tdy->dm gets freed in TDyDestroy).

knepley commented 2 years ago

Is it possible that we get a new DM from DMDistribute() but do not free the original?

jeff-cohere commented 2 years ago

No, we destroy the serial DM in the case that we get a distributed one. And I've verified that the pointer address for the DM that's leaked is the same one we pass to DMDestroy in TDyDestroy. Very strange.

jeff-cohere commented 2 years ago

I don't know whether this is related, but in the version of PETSc we're currently using I see the following at the end of DMDestroy (in dm.c):

  /* We do not destroy (*dm)->data here so that we can reference count backend objects */
  ierr = PetscHeaderDestroy(dm);CHKERRQ(ierr);

Could we have a dangling reference to the DM somewhere?

jeff-cohere commented 2 years ago

I created #223 to track this mysterious leak. I don't think there's a need to figure it out before we merge this PR (though it would be nice if it were easy!).

knepley commented 2 years ago

It is not the backend (it was my comment trying to understand the PETSc convention). It is very likely that we either a) have something else keeping the DM alive, like an unfreed Vec, or b) have a reference cycle. However, both of those would mean there is at least one more thing unfreed.