LSSTDESC / CCL

DESC Core Cosmology Library: cosmology routines with validated numerical accuracy
BSD 3-Clause "New" or "Revised" License
145 stars 68 forks source link

Fix halo inconsistencies v3 #1064

Closed nikfilippas closed 1 year ago

nikfilippas commented 1 year ago

This PR contains fixes to some design flaws in halos which can ultimately cause memory leaks and model inconsistencies. Note: at the moment it breaks API (to make review easier), but once review is complete I will clone it into a non API-breaking one.

(5' read)

Problem 1

When MassDef is instantiated with a concentration it inadvertently contains a reference to itself:

mass_def is mass_def.concentration.mass_def is mass_def.[...].mass_def

This causes a memory leak because when the object is deleted, the garbage collector doesn't know the order of deletion so the instances hang in there for the rest of the runtime. This can easily be demonstrated:

import pyccl as ccl

class Concentration:

    def __init__(self, mass_def):
        self.mass_def = mass_def
        self.id = hex(id(self))[-5:]
        print(f"Concentration {self.id} created.")

    def __del__(self):
        print(f"Concentration {self.id} destroyed.")

class MassDef:

    def __init__(self):
        self.concentration = Concentration(self)  # line that causes the problem
        self.id = hex(id(self))[-5:]
        print(f"MassDef {self.id} created.")

    def __del__(self):
        print(f"MassDef {self.id} destroyed.")

mass_def = MassDef()
del mass_def  # doesn't do anything

print("program ends")

which outputs

Concentration 36050 created.
MassDef 3af50 created.
program ends
MassDef 3af50 destroyed.
Concentration 36050 destroyed.

An easy way to fix that is to use a weakref proxy of self when passing it to the Concentration constructor: import weakref ... self.concentration = Concentration(weakref.proxy(self)). This allows the reference count of mass_def to go to zero and the gbc to purge it. This tactic is not generally advised though.

However, there is a deeper design flaw with the interplay of MassDef and Concentration.

Problem 2

Matter profiles require a Concentration to initialize. However, this isn't really a parameter; it is akin to setting up the model. As is mass_def (which is why we agreed to deprecate it from calls to subclasses of HMIngredients). This is how these two are used in the code, in every matter profile:

    def _real(self, cosmo, r, M, a, mass_def):
        # Comoving virial radius
        R_M = mass_def.get_radius(cosmo, M, a) / a
        c_M = self.concentration(cosmo, M, a)
        R_s = R_M / c_M

so they aren't used in anything other than finding the comoving virial radius, given M. These two work together; they should be together, but one is passed at initialization and the other one at calls. What's more, this creates an inconsistency: the mass_def of concentration may not be the same as the mass_def passed in _real. Luckily, most implemented concentrations are mass_def-agnostic, and also users have (hopefully) been passing the same one, but this can easily create problems.

Solution

In practice, the new class would contain self.mass_def and self.concentration and would avoid self-references and other caveats. It would initialize both the mass definition and the concentration prescription at the same time.

This is what it could look like:

class MassConcentration(MassDef, Concentration):
    def __init__(self, *, mass_def, concentration):
        # careful with initialization: we don't want cyclic references like in MassDef
        self.mass_def = mass_def
        self.concentration = concentration

and then, all halo profiles would have:

class HaloProfileNFW(HaloProfile):
    def __init__(self, mass_concentration, ...):
        self.mass_concentration = mass_concentration
        ...

    def _real(self, cosmo, r, M, a):
        R_M = self.mass_concentration.get_radius(cosmo, M, a) / a
        c_M = self.mass_concentration.concentration(cosmo, M, a)
        R_s = R_M / cM

We could even go as far as implementing a get_comoving_virial_radius method, to do all of that in a single step and avoid having to repeat that in all matter profiles.

It would also make physical sense because the parameters passed in __init__ would uniquely define a profile 'entity', which would then be called via its methods and (r, M, a) (and cosmo which is an outlier as we've already discussed).

For v2.final we can leave things as they are, but add warnings that these will be deprecated in the future, and point users towards using MassConcentration objects instead of separate MassDef and Concentration.

The solution would simplify the code - for example, in benchmarks/test_haloprofile.py we do

    mdef = ccl.halos.MassDef(mDelta, 'matter')
    c = ccl.halos.ConcentrationConstant(c=concentration, mdef=mdef)
    mdef = ccl.halos.MassDef(mDelta, 'matter', concentration=c)
    p = ccl.halos.HaloProfileEinasto(c, truncated=False)

so there are two instantiations of a mass definition to get the one we want to use. With the new implementation we won't need that - it would be a single step.

This proposal is a widely used design pattern in OOP called "dependency injection", and is a technique where an object is passed as a dependency to another object, rather than having the dependent object create the dependency itself. It helps to reduce coupling between different objects, and makes the code more modular and testable.