This PR contains fixes to some design flaws in halos which can ultimately cause memory leaks and model inconsistencies.
Note: at the moment it breaks API (to make review easier), but once review is complete I will clone it into a non API-breaking one.
(5' read)
Problem 1
When MassDef is instantiated with a concentration it inadvertently contains a reference to itself:
mass_def is mass_def.concentration.mass_def is mass_def.[...].mass_def
This causes a memory leak because when the object is deleted, the garbage collector doesn't know the order of deletion so the instances hang in there for the rest of the runtime. This can easily be demonstrated:
import pyccl as ccl
class Concentration:
def __init__(self, mass_def):
self.mass_def = mass_def
self.id = hex(id(self))[-5:]
print(f"Concentration {self.id} created.")
def __del__(self):
print(f"Concentration {self.id} destroyed.")
class MassDef:
def __init__(self):
self.concentration = Concentration(self) # line that causes the problem
self.id = hex(id(self))[-5:]
print(f"MassDef {self.id} created.")
def __del__(self):
print(f"MassDef {self.id} destroyed.")
mass_def = MassDef()
del mass_def # doesn't do anything
print("program ends")
An easy way to fix that is to use a weakref proxy of self when passing it to the Concentration constructor:
import weakref ... self.concentration = Concentration(weakref.proxy(self)). This allows the reference count of mass_def to go to zero and the gbc to purge it. This tactic is not generally advised though.
However, there is a deeper design flaw with the interplay of MassDef and Concentration.
Problem 2
Matter profiles require a Concentration to initialize. However, this isn't really a parameter; it is akin to setting up the model. As is mass_def (which is why we agreed to deprecate it from calls to subclasses of HMIngredients). This is how these two are used in the code, in every matter profile:
def _real(self, cosmo, r, M, a, mass_def):
# Comoving virial radius
R_M = mass_def.get_radius(cosmo, M, a) / a
c_M = self.concentration(cosmo, M, a)
R_s = R_M / c_M
so they aren't used in anything other than finding the comoving virial radius, given M. These two work together; they should be together, but one is passed at initialization and the other one at calls. What's more, this creates an inconsistency: the mass_def of concentration may not be the same as the mass_def passed in _real. Luckily, most implemented concentrations are mass_def-agnostic, and also users have (hopefully) been passing the same one, but this can easily create problems.
Solution
One way is to add a check every time e.g. _real is called to verify that the mass definitions are consistent. That's a bad solution, no explanation needed.
Another way is to deprecate concentration in __init__ and pass it as an extra argument in e.g. _real. That's a bad solution because not all profiles use a Concentration, so we'll end up unnecessarily complicating the code.
Another way is to move mass_def to __init__ and deprecate it from e.g. _real. That's okay, but the non-matter profiles still use mass_def to compute the radius, given the mass. That means, in order to be consistent, we'd have to introduce mass_def as an __init__ argument in all profiles.
I propose the following. Since MassDef and Concentration are so strongly coupled, it makes sense to combine them into one single object. Then, every profile will accept a MassConcentration object in __init__ which contains all the info it will ever need: mass definition and an optional concentration. This will simplify things.
In practice, the new class would contain self.mass_def and self.concentration and would avoid self-references and other caveats. It would initialize both the mass definition and the concentration prescription at the same time.
This is what it could look like:
class MassConcentration(MassDef, Concentration):
def __init__(self, *, mass_def, concentration):
# careful with initialization: we don't want cyclic references like in MassDef
self.mass_def = mass_def
self.concentration = concentration
and then, all halo profiles would have:
class HaloProfileNFW(HaloProfile):
def __init__(self, mass_concentration, ...):
self.mass_concentration = mass_concentration
...
def _real(self, cosmo, r, M, a):
R_M = self.mass_concentration.get_radius(cosmo, M, a) / a
c_M = self.mass_concentration.concentration(cosmo, M, a)
R_s = R_M / cM
We could even go as far as implementing a get_comoving_virial_radius method, to do all of that in a single step and avoid having to repeat that in all matter profiles.
It would also make physical sense because the parameters passed in __init__ would uniquely define a profile 'entity', which would then be called via its methods and (r, M, a) (and cosmo which is an outlier as we've already discussed).
For v2.final we can leave things as they are, but add warnings that these will be deprecated in the future, and point users towards using MassConcentration objects instead of separate MassDef and Concentration.
The solution would simplify the code - for example, in benchmarks/test_haloprofile.py we do
mdef = ccl.halos.MassDef(mDelta, 'matter')
c = ccl.halos.ConcentrationConstant(c=concentration, mdef=mdef)
mdef = ccl.halos.MassDef(mDelta, 'matter', concentration=c)
p = ccl.halos.HaloProfileEinasto(c, truncated=False)
so there are two instantiations of a mass definition to get the one we want to use. With the new implementation we won't need that - it would be a single step.
This proposal is a widely used design pattern in OOP called "dependency injection", and is a technique where an object is passed as a dependency to another object, rather than having the dependent object create the dependency itself. It helps to reduce coupling between different objects, and makes the code more modular and testable.
This PR contains fixes to some design flaws in halos which can ultimately cause memory leaks and model inconsistencies. Note: at the moment it breaks API (to make review easier), but once review is complete I will clone it into a non API-breaking one.
(5' read)
Problem 1
When
MassDef
is instantiated with a concentration it inadvertently contains a reference to itself:This causes a memory leak because when the object is deleted, the garbage collector doesn't know the order of deletion so the instances hang in there for the rest of the runtime. This can easily be demonstrated:
which outputs
An easy way to fix that is to use a weakref proxy of
self
when passing it to theConcentration
constructor:import weakref ... self.concentration = Concentration(weakref.proxy(self))
. This allows the reference count ofmass_def
to go to zero and the gbc to purge it. This tactic is not generally advised though.However, there is a deeper design flaw with the interplay of
MassDef
andConcentration
.Problem 2
Matter profiles require a
Concentration
to initialize. However, this isn't really a parameter; it is akin to setting up the model. As ismass_def
(which is why we agreed to deprecate it from calls to subclasses ofHMIngredients
). This is how these two are used in the code, in every matter profile:so they aren't used in anything other than finding the comoving virial radius, given
M
. These two work together; they should be together, but one is passed at initialization and the other one at calls. What's more, this creates an inconsistency: themass_def
ofconcentration
may not be the same as themass_def
passed in_real
. Luckily, most implemented concentrations aremass_def
-agnostic, and also users have (hopefully) been passing the same one, but this can easily create problems.Solution
_real
is called to verify that the mass definitions are consistent. That's a bad solution, no explanation needed.concentration
in__init__
and pass it as an extra argument in e.g._real
. That's a bad solution because not all profiles use aConcentration
, so we'll end up unnecessarily complicating the code.mass_def
to__init__
and deprecate it from e.g._real
. That's okay, but the non-matter profiles still usemass_def
to compute the radius, given the mass. That means, in order to be consistent, we'd have to introducemass_def
as an__init__
argument in all profiles.MassDef
andConcentration
are so strongly coupled, it makes sense to combine them into one single object. Then, every profile will accept aMassConcentration
object in__init__
which contains all the info it will ever need: mass definition and an optional concentration. This will simplify things.In practice, the new class would contain
self.mass_def
andself.concentration
and would avoid self-references and other caveats. It would initialize both the mass definition and the concentration prescription at the same time.This is what it could look like:
and then, all halo profiles would have:
We could even go as far as implementing a
get_comoving_virial_radius
method, to do all of that in a single step and avoid having to repeat that in all matter profiles.It would also make physical sense because the parameters passed in
__init__
would uniquely define a profile 'entity', which would then be called via its methods and(r, M, a)
(andcosmo
which is an outlier as we've already discussed).For
v2.final
we can leave things as they are, but add warnings that these will be deprecated in the future, and point users towards usingMassConcentration
objects instead of separateMassDef
andConcentration
.The solution would simplify the code - for example, in
benchmarks/test_haloprofile.py
we doso there are two instantiations of a mass definition to get the one we want to use. With the new implementation we won't need that - it would be a single step.
This proposal is a widely used design pattern in OOP called "dependency injection", and is a technique where an object is passed as a dependency to another object, rather than having the dependent object create the dependency itself. It helps to reduce coupling between different objects, and makes the code more modular and testable.