MPAS-Dev / MPAS

Repository for private MPAS development prior to the MPAS v6.0 release.
Other
4 stars 0 forks source link

Fix framework config pool memory leak #1501

Closed Larofeticus closed 6 years ago

Larofeticus commented 6 years ago

This leak was found investigating the cause of the problem with: https://github.com/ACME-Climate/ACME/pull/2041

The framework has some functions which manage config_pool storage used during halo exchanges. My halo data structure reuse code needs to directly modify some of these config values between exchanges but the framework currently only supports mpas_pool_add_config() and mpas_pool_remove_config() so that's what was used.

The add config allocates three elements: a mpas_pool_member_type, field of that which is mpas_pool_data_type, and a scalar member of that either int, real, char, or logical.

The remove config only deallocates the mpas_pool_member_type and the payload scalar, but forgets the mpas_pool_data_type. Along with a suspicious TODO comment speculating that there might be more to do.

Under previous circumstances this add_config to remove_config cycle didn't happen enough to drive an OOM, but the heavier usage by my reuse code does.

jonbob commented 6 years ago

Tagging @douglasjacobsen , who originally developed this code

douglasjacobsen commented 6 years ago

A few comments. While this does appear to be a memory leak, there was some consideration about this when we wrote this code. Fundamentally, the remove routines within the pool data structure were not intended to destroy data, just remove it from the pool. However, when configs and dimensions are added, they are duplicated (such that they wouldn't be synchronized between multiple pools, like fields would be).

So, this is a memory leak currently, but you are not fixing all of the memory leak. Look at the destroy pool routine for more information on how to fix it properly. However, some thought should be given to the way configs are added to pools instead of just adding this. It might be preferable to add configs and dimensions in a similar way to fields, and make the remove routines only remove them from the pool rather than deallocate data.

I think the comment you add is not needed in this PR as it really doesn't add anything helpful, but I'll leave that to @mgduda to decide on.

Some comments on your notes about the reusable halo exchanges:

Beyond that, your halo buffer reuse routines do not actually need to modify the config value. If you read through those routines, you'll notice that the config isn't actually used anywhere. Additionally, time levels are specified before buffers are built (i.e. time levels need to be specified before the buffer is built, because they determine the size of the buffers).

Finally, the halo exchanges didn't present a memory leak before because they were not continually adding and removing configs from a pool. Instead, after the halo exchange was finished the pool was destroyed. When a pool is destroyed, memory is deallocated properly and doesn't cause a memory leak.

If you're interested more, feel free to write me an email or tag me in an issue about it, but I don't think this PR is the right place to discuss it.